All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC net-next 0/6] net: support QUIC crypto
       [not found] <Adel Abouchaev <adel.abushaev@gmail.com>
@ 2022-08-01 19:52 ` Adel Abouchaev
  2022-08-01 19:52   ` [RFC net-next 1/6] net: Documentation on QUIC kernel Tx crypto Adel Abouchaev
                     ` (6 more replies)
  2022-08-03 16:40 ` Adel Abouchaev
                   ` (6 subsequent siblings)
  7 siblings, 7 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-01 19:52 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest, Adel Abouchaev

QUIC requires end to end encryption of the data. The application usually
prepares the data in clear text, encrypts and calls send() which implies
multiple copies of the data before the packets hit the networking stack.
Similar to kTLS, QUIC kernel offload of cryptography reduces the memory
pressure by reducing the number of copies.

The scope of kernel support is limited to the symmetric cryptography,
leaving the handshake to the user space library. For QUIC in particular,
the application packets that require symmetric cryptography are the 1RTT
packets with short headers. Kernel will encrypt the application packets
on transmission and decrypt on receive. This series implements Tx only,
because in QUIC server applications Tx outweighs Rx by orders of
magnitude.

Supporting the combination of QUIC and GSO requires the application to
correctly place the data and the kernel to correctly slice it. The
encryption process appends an arbitrary number of bytes (tag) to the end
of the message to authenticate it. The GSO value should include this
overhead, the offload would then subtract the tag size to parse the
input on Tx before chunking and encrypting it.

With the kernel cryptography, the buffer copy operation is conjoined
with the encryption operation. The memory bandwidth is reduced by 5-8%.
When devices supporting QUIC encryption in hardware come to the market,
we will be able to free further 7% of CPU utilization which is used
today for crypto operations.


Adel Abouchaev (6):
  Documentation on QUIC kernel Tx crypto.
  Define QUIC specific constants, control and data plane structures
  Add UDP ULP operations, initialization and handling prototype
    functions.
  Implement QUIC offload functions
  Add flow counters and Tx processing error counter
  Add self tests for ULP operations, flow setup and crypto tests

 Documentation/networking/quic.rst      |  176 +++
 include/net/inet_sock.h                |    2 +
 include/net/netns/mib.h                |    3 +
 include/net/quic.h                     |   59 +
 include/net/snmp.h                     |    6 +
 include/net/udp.h                      |   33 +
 include/uapi/linux/quic.h              |   61 +
 include/uapi/linux/snmp.h              |   11 +
 include/uapi/linux/udp.h               |    4 +
 net/Kconfig                            |    1 +
 net/Makefile                           |    1 +
 net/ipv4/Makefile                      |    3 +-
 net/ipv4/udp.c                         |   14 +
 net/ipv4/udp_ulp.c                     |  190 ++++
 net/quic/Kconfig                       |   16 +
 net/quic/Makefile                      |    8 +
 net/quic/quic_main.c                   | 1446 ++++++++++++++++++++++++
 net/quic/quic_proc.c                   |   45 +
 tools/testing/selftests/net/.gitignore |    1 +
 tools/testing/selftests/net/Makefile   |    2 +-
 tools/testing/selftests/net/quic.c     | 1024 +++++++++++++++++
 tools/testing/selftests/net/quic.sh    |   45 +
 22 files changed, 3149 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/networking/quic.rst
 create mode 100644 include/net/quic.h
 create mode 100644 include/uapi/linux/quic.h
 create mode 100644 net/ipv4/udp_ulp.c
 create mode 100644 net/quic/Kconfig
 create mode 100644 net/quic/Makefile
 create mode 100644 net/quic/quic_main.c
 create mode 100644 net/quic/quic_proc.c
 create mode 100644 tools/testing/selftests/net/quic.c
 create mode 100755 tools/testing/selftests/net/quic.sh

-- 
2.30.2


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [RFC net-next 1/6] net: Documentation on QUIC kernel Tx crypto.
  2022-08-01 19:52 ` [RFC net-next 0/6] net: support QUIC crypto Adel Abouchaev
@ 2022-08-01 19:52   ` Adel Abouchaev
  2022-08-01 19:52   ` [RFC net-next 2/6] net: Define QUIC specific constants, control and data plane structures Adel Abouchaev
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-01 19:52 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest, Adel Abouchaev

Adding Documentation/networking/quic.rst file to describe kernel QUIC
code.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>
---
 Documentation/networking/quic.rst | 176 ++++++++++++++++++++++++++++++
 1 file changed, 176 insertions(+)
 create mode 100644 Documentation/networking/quic.rst

diff --git a/Documentation/networking/quic.rst b/Documentation/networking/quic.rst
new file mode 100644
index 000000000000..eaa2d36310be
--- /dev/null
+++ b/Documentation/networking/quic.rst
@@ -0,0 +1,176 @@
+.. _kernel_quic:
+
+===========
+KERNEL QUIC
+===========
+
+Overview
+========
+
+QUIC is a secure general-purpose transport protocol that creates a stateful
+interaction between a client and a server. QUIC provides end-to-end integrity
+and confidentiality. Refer to RFC 9000 for more information on QUIC.
+
+The kernel Tx side offload covers the encryption of the application streams
+in the kernel rather than in the application. These packets are 1RTT packets
+in QUIC connection. Encryption of every other packets is still done by the
+QUIC library in user space.
+
+
+
+User Interface
+==============
+
+Creating a QUIC connection
+--------------------------
+
+QUIC connection originates and terminates in the application, using one of many
+available QUIC libraries. The code instantiates QUIC client and QUIC server in
+some form and configures them to use certain addresses and ports for the
+source and destination. The client and server negotiate the set of keys to
+protect the communication during different phases of the connection, maintain
+the connection and perform congestion control.
+
+Requesting to add QUIC Tx kernel encryption to the connection
+-------------------------------------------------------------
+
+Each flow that should be encrypted by the kernel needs to be registered with
+the kernel using socket API. A setsockopt() call on the socket creates an
+association between the QUIC connection ID of the flow with the encryption
+parameters for the crypto operations:
+
+.. code_block:: c
+  struct quic_connection_info conn_info;
+  char conn_id[5] = {0x01, 0x02, 0x03, 0x04, 0x05};
+  const size_t conn_id_len = sizeof(conn_id);
+  char conn_key[16] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+                       0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f};
+  char conn_iv[12] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+                      0x08, 0x09, 0x0a, 0x0b};
+  char conn_hdr_key[16] = {0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
+                       0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f};
+
+  conn_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+
+  memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+  conn_info.key.conn_id_length = 5;
+  memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE - conn_id_len],
+         &conn_id, conn_id_len);
+
+  memcpy(&conn_info.payload_key, conn_key, sizeof(conn_key));
+  memcpy(&conn_info.payload_iv, conn_iv, sizeof(conn_iv));
+  memcpy(&conn_info.header_key, conn_hdr_key, sizeof(conn_hdr_key));
+
+  setsockopt(fd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION, &conn_info,
+             sizeof(conn_info));
+
+
+Requesting to remove QUIC Tx kernel crypto offload control messages
+-------------------------------------------------------------------
+
+All flows are removed when the socket is closed. To request an explicit remove
+of the offload for the connection during the lifetime of the socket the process
+is similar to adding the flow. Only the connection ID and its length are 
+necessary to supply to remove the connection from the offload:
+
+.. code_block:: c
+
+  memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+  conn_info.key.conn_id_length = 5;
+  memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE - conn_id_len],
+         &conn_id, conn_id_len);
+  setsockopt(fd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION, &conn_info,
+             sizeof(conn_info));
+
+Sending QUIC application data
+-----------------------------
+
+For QUIC Tx encryption offload, the application should use sendmsg() socket
+call and provide ancillary data with information on connection ID length and
+offload flags for the kernel to perform the encryption and GSO support if
+requested.
+
+.. code_block:: c
+
+  size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
+  uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)];
+  struct quic_tx_ancillary_data * anc_data;
+  size_t quic_data_len = 4500;
+  struct cmsghdr * cmsg_hdr;
+  char quic_data[9000];
+  struct iovec iov[2];
+  int send_len = 9000;
+  struct msghdr msg;
+  int err;
+
+  iov[0].iov_base = quic_data;
+  iov[0].iov_len = quic_data_len;
+  iov[1].iov_base = quic_data + 4500;
+  iov[1].iov_len = quic_data_len;
+
+  if (client.addr.sin_family == AF_INET) {
+    msg.msg_name = &client.addr;
+    msg.msg_namelen = sizeof(client.addr);
+  } else {
+    msg.msg_name = &client.addr6;
+    msg.msg_namelen = sizeof(client.addr6);
+  }
+
+  msg.msg_iov = iov;
+  msg.msg_iovlen = 2;
+  msg.msg_control = cmsg_buf;
+  msg.msg_controllen = sizeof(cmsg_buf);
+  cmsg_hdr = CMSG_FIRSTHDR(&msg);
+  cmsg_hdr->cmsg_level = IPPROTO_UDP;
+  cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+  cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
+  anc_data = CMSG_DATA(cmsg_hdr);
+  anc_data->flags = 0;
+  anc_data->next_pkt_num = 0x0d65c9;
+  anc_data->conn_id_length = conn_id_len;
+  err = sendmsg(self->sfd, &msg, 0);
+
+QUIC Tx offload in kernel will read the data from userspace, encrypt and
+copy it to the ciphertext within the same operation.
+
+
+Sending QUIC application data with GSO
+--------------------------------------
+When GSO is in use, the kernel will use the GSO fragment size as the target
+for ciphertext. The packets from the user space should align on the boundary
+of GSO fragment size minus the size of the tag for the chosen cipher. For the
+GSO fragment 1200, the plain packets should follow each other at every 1184
+bytes, given the tag size of 16. After the encryption, the rest of the UDP
+and IP stacks will follow the defined value of GSO fragment which will include
+the trailing tag bytes.
+
+To set up GSO fragmentation:
+
+.. code_block:: c
+  setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size, sizeof(frag_size));
+
+If the GSO fragment size is provided in ancillary data within the sendmsg()
+call, the value in ancillary data will take precedence over the segment size
+provided in setsockopt to split the payload into packets. This is consistent
+with the UDP stack behavior.
+
+Integrating to userspace QUIC libraries
+---------------------------------------
+
+Userspace QUIC libraries integration would depend on the implementation of the
+QUIC protocol. For MVFST library, the control plane is integrated into the
+handshake callbacks to properly configure the flows into the socket; and the
+data plane is integrated into the methods that perform encryption and send
+the packets to the batch scheduler for transmissions to the socket.
+
+MVFST library can be found at https://github.com/facebookincubator/mvfst.
+
+Statistics
+==========
+
+QUIC Tx offload to the kernel has counters reflected in /proc/net/quic_stat:
+
+  QuicCurrTxSw  - number of currently active kernel offloaded QUIC connections
+  QuicTxSw      - accumulative total number of offloaded QUIC connections
+  QuicTxSwError - accumulative total number of errors during QUIC Tx offload to
+                  kernel
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC net-next 2/6] net: Define QUIC specific constants, control and data plane structures
  2022-08-01 19:52 ` [RFC net-next 0/6] net: support QUIC crypto Adel Abouchaev
  2022-08-01 19:52   ` [RFC net-next 1/6] net: Documentation on QUIC kernel Tx crypto Adel Abouchaev
@ 2022-08-01 19:52   ` Adel Abouchaev
  2022-08-01 19:52   ` [RFC net-next 3/6] net: Add UDP ULP operations, initialization and handling prototype functions Adel Abouchaev
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-01 19:52 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest, Adel Abouchaev

Define control and data plane structures to pass in control plane for
flow add/remove and during packet send within ancillary data. Define
constants to use within SOL_UDP to program QUIC sockets.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>
---
 include/uapi/linux/quic.h | 61 +++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/udp.h  |  3 ++
 2 files changed, 64 insertions(+)
 create mode 100644 include/uapi/linux/quic.h

diff --git a/include/uapi/linux/quic.h b/include/uapi/linux/quic.h
new file mode 100644
index 000000000000..79680b8b18a6
--- /dev/null
+++ b/include/uapi/linux/quic.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) */
+
+#ifndef _UAPI_LINUX_QUIC_H
+#define _UAPI_LINUX_QUIC_H
+
+#include <linux/types.h>
+#include <linux/tls.h>
+
+#define QUIC_MAX_CONNECTION_ID_SIZE 20
+
+/* Side by side data for QUIC egress operations */
+#define QUIC_BYPASS_ENCRYPTION 0x01
+
+struct quic_tx_ancillary_data {
+	__aligned_u64	next_pkt_num;
+	__u8	flags;
+	__u8	conn_id_length;
+};
+
+struct quic_connection_info_key {
+	__u8	conn_id[QUIC_MAX_CONNECTION_ID_SIZE];
+	__u8	conn_id_length;
+};
+
+struct quic_aes_gcm_128 {
+	__u8	header_key[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
+	__u8	payload_key[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
+	__u8	payload_iv[TLS_CIPHER_AES_GCM_128_IV_SIZE];
+};
+
+struct quic_aes_gcm_256 {
+	__u8	header_key[TLS_CIPHER_AES_GCM_256_KEY_SIZE];
+	__u8	payload_key[TLS_CIPHER_AES_GCM_256_KEY_SIZE];
+	__u8	payload_iv[TLS_CIPHER_AES_GCM_256_IV_SIZE];
+};
+
+struct quic_aes_ccm_128 {
+	__u8	header_key[TLS_CIPHER_AES_CCM_128_KEY_SIZE];
+	__u8	payload_key[TLS_CIPHER_AES_CCM_128_KEY_SIZE];
+	__u8	payload_iv[TLS_CIPHER_AES_CCM_128_IV_SIZE];
+};
+
+struct quic_chacha20_poly1305 {
+	__u8	header_key[TLS_CIPHER_CHACHA20_POLY1305_KEY_SIZE];
+	__u8	payload_key[TLS_CIPHER_CHACHA20_POLY1305_KEY_SIZE];
+	__u8	payload_iv[TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE];
+};
+
+struct quic_connection_info {
+	__u16	cipher_type;
+	struct quic_connection_info_key		key;
+	union {
+		struct quic_aes_gcm_128 aes_gcm_128;
+		struct quic_aes_gcm_256 aes_gcm_256;
+		struct quic_aes_ccm_128 aes_ccm_128;
+		struct quic_chacha20_poly1305 chacha20_poly1305;
+	};
+};
+
+#endif
+
diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
index 4828794efcf8..0ee4c598e70b 100644
--- a/include/uapi/linux/udp.h
+++ b/include/uapi/linux/udp.h
@@ -34,6 +34,9 @@ struct udphdr {
 #define UDP_NO_CHECK6_RX 102	/* Disable accpeting checksum for UDP6 */
 #define UDP_SEGMENT	103	/* Set GSO segmentation size */
 #define UDP_GRO		104	/* This socket can receive UDP GRO packets */
+#define UDP_QUIC_ADD_TX_CONNECTION	106 /* Add QUIC Tx crypto offload */
+#define UDP_QUIC_DEL_TX_CONNECTION	107 /* Del QUIC Tx crypto offload */
+#define UDP_QUIC_ENCRYPT		108 /* QUIC encryption parameters */
 
 /* UDP encapsulation types */
 #define UDP_ENCAP_ESPINUDP_NON_IKE	1 /* draft-ietf-ipsec-nat-t-ike-00/01 */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC net-next 3/6] net: Add UDP ULP operations, initialization and handling prototype functions.
  2022-08-01 19:52 ` [RFC net-next 0/6] net: support QUIC crypto Adel Abouchaev
  2022-08-01 19:52   ` [RFC net-next 1/6] net: Documentation on QUIC kernel Tx crypto Adel Abouchaev
  2022-08-01 19:52   ` [RFC net-next 2/6] net: Define QUIC specific constants, control and data plane structures Adel Abouchaev
@ 2022-08-01 19:52   ` Adel Abouchaev
  2022-08-01 19:52   ` [RFC net-next 4/6] net: Implement QUIC offload functions Adel Abouchaev
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-01 19:52 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest, Adel Abouchaev

Define functions to add UDP ULP handling, registration with UDP protocol
and supporting data structures. Create structure for QUIC ULP and add empty
prototype functions to support it.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>
---
 include/net/inet_sock.h  |   2 +
 include/net/udp.h        |  33 +++++++
 include/uapi/linux/udp.h |   1 +
 net/Kconfig              |   1 +
 net/Makefile             |   1 +
 net/ipv4/Makefile        |   3 +-
 net/ipv4/udp.c           |   6 ++
 net/ipv4/udp_ulp.c       | 190 +++++++++++++++++++++++++++++++++++++++
 8 files changed, 236 insertions(+), 1 deletion(-)
 create mode 100644 net/ipv4/udp_ulp.c

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 6395f6b9a5d2..e9c44b3ccffe 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -238,6 +238,8 @@ struct inet_sock {
 	__be32			mc_addr;
 	struct ip_mc_socklist __rcu	*mc_list;
 	struct inet_cork_full	cork;
+	const struct udp_ulp_ops	*udp_ulp_ops;
+	void __rcu		*ulp_data;
 };
 
 #define IPCORK_OPT	1	/* ip-options has been held in ipcork.opt */
diff --git a/include/net/udp.h b/include/net/udp.h
index 8dd4aa1485a6..f50011a20c92 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -523,4 +523,37 @@ struct proto *udp_bpf_get_proto(struct sock *sk, struct sk_psock *psock);
 int udp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore);
 #endif
 
+/*
+ * Interface for adding Upper Level Protocols over UDP
+ */
+
+#define UDP_ULP_NAME_MAX	16
+#define UDP_ULP_MAX		128
+
+struct udp_ulp_ops {
+	struct list_head	list;
+
+	/* initialize ulp */
+	int (*init)(struct sock *sk);
+	/* cleanup ulp */
+	void (*release)(struct sock *sk);
+
+	char		name[UDP_ULP_NAME_MAX];
+	struct module	*owner;
+};
+
+int udp_register_ulp(struct udp_ulp_ops *type);
+void udp_unregister_ulp(struct udp_ulp_ops *type);
+int udp_set_ulp(struct sock *sk, const char *name);
+void udp_get_available_ulp(char *buf, size_t len);
+void udp_cleanup_ulp(struct sock *sk);
+int udp_setsockopt_ulp(struct sock *sk, sockptr_t optval,
+		       unsigned int optlen);
+int udp_getsockopt_ulp(struct sock *sk, char __user *optval,
+		       int __user *optlen);
+
+#define MODULE_ALIAS_UDP_ULP(name)\
+	__MODULE_INFO(alias, alias_userspace, name);\
+	__MODULE_INFO(alias, alias_udp_ulp, "udp-ulp-" name)
+
 #endif	/* _UDP_H */
diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
index 0ee4c598e70b..893691f0108a 100644
--- a/include/uapi/linux/udp.h
+++ b/include/uapi/linux/udp.h
@@ -34,6 +34,7 @@ struct udphdr {
 #define UDP_NO_CHECK6_RX 102	/* Disable accpeting checksum for UDP6 */
 #define UDP_SEGMENT	103	/* Set GSO segmentation size */
 #define UDP_GRO		104	/* This socket can receive UDP GRO packets */
+#define UDP_ULP		105	/* Attach ULP to a UDP socket */
 #define UDP_QUIC_ADD_TX_CONNECTION	106 /* Add QUIC Tx crypto offload */
 #define UDP_QUIC_DEL_TX_CONNECTION	107 /* Del QUIC Tx crypto offload */
 #define UDP_QUIC_ENCRYPT		108 /* QUIC encryption parameters */
diff --git a/net/Kconfig b/net/Kconfig
index 6b78f695caa6..93e3b1308aec 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -63,6 +63,7 @@ menu "Networking options"
 source "net/packet/Kconfig"
 source "net/unix/Kconfig"
 source "net/tls/Kconfig"
+source "net/quic/Kconfig"
 source "net/xfrm/Kconfig"
 source "net/iucv/Kconfig"
 source "net/smc/Kconfig"
diff --git a/net/Makefile b/net/Makefile
index fbfeb8a0bb37..28565bfe29cb 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -16,6 +16,7 @@ obj-y				+= ethernet/ 802/ sched/ netlink/ bpf/ ethtool/
 obj-$(CONFIG_NETFILTER)		+= netfilter/
 obj-$(CONFIG_INET)		+= ipv4/
 obj-$(CONFIG_TLS)		+= tls/
+obj-$(CONFIG_QUIC)		+= quic/
 obj-$(CONFIG_XFRM)		+= xfrm/
 obj-$(CONFIG_UNIX_SCM)		+= unix/
 obj-y				+= ipv6/
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index bbdd9c44f14e..88d3baf4af95 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -14,7 +14,8 @@ obj-y     := route.o inetpeer.o protocol.o \
 	     udp_offload.o arp.o icmp.o devinet.o af_inet.o igmp.o \
 	     fib_frontend.o fib_semantics.o fib_trie.o fib_notifier.o \
 	     inet_fragment.o ping.o ip_tunnel_core.o gre_offload.o \
-	     metrics.o netlink.o nexthop.o udp_tunnel_stub.o
+	     metrics.o netlink.o nexthop.o udp_tunnel_stub.o \
+	     udp_ulp.o
 
 obj-$(CONFIG_BPFILTER) += bpfilter/
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index aa9f2ec3dc46..e4a5f66b3141 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2778,6 +2778,9 @@ int udp_lib_setsockopt(struct sock *sk, int level, int optname,
 		up->pcflag |= UDPLITE_RECV_CC;
 		break;
 
+	case UDP_ULP:
+		return udp_setsockopt_ulp(sk, optval, optlen);
+
 	default:
 		err = -ENOPROTOOPT;
 		break;
@@ -2846,6 +2849,9 @@ int udp_lib_getsockopt(struct sock *sk, int level, int optname,
 		val = up->pcrlen;
 		break;
 
+	case UDP_ULP:
+		return udp_getsockopt_ulp(sk, optval, optlen);
+
 	default:
 		return -ENOPROTOOPT;
 	}
diff --git a/net/ipv4/udp_ulp.c b/net/ipv4/udp_ulp.c
new file mode 100644
index 000000000000..3801ed7ad17d
--- /dev/null
+++ b/net/ipv4/udp_ulp.c
@@ -0,0 +1,190 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Pluggable UDP upper layer protocol support, based on pluggable TCP upper
+ * layer protocol support.
+ *
+ * Copyright (c) 2016-2017, Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2016-2017, Dave Watson <davejwatson@fb.com>. All rights reserved.
+ */
+
+#include <linux/gfp.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/mm.h>
+#include <linux/types.h>
+#include <linux/skmsg.h>
+#include <net/tcp.h>
+#include <net/udp.h>
+
+static DEFINE_SPINLOCK(udp_ulp_list_lock);
+static LIST_HEAD(udp_ulp_list);
+
+/* Simple linear search, don't expect many entries! */
+static struct udp_ulp_ops *udp_ulp_find(const char *name)
+{
+	struct udp_ulp_ops *e;
+
+	list_for_each_entry_rcu(e, &udp_ulp_list, list,
+				lockdep_is_held(&udp_ulp_list_lock)) {
+		if (strcmp(e->name, name) == 0)
+			return e;
+	}
+
+	return NULL;
+}
+
+static const struct udp_ulp_ops *__udp_ulp_find_autoload(const char *name)
+{
+	const struct udp_ulp_ops *ulp = NULL;
+
+	rcu_read_lock();
+	ulp = udp_ulp_find(name);
+
+#ifdef CONFIG_MODULES
+	if (!ulp && capable(CAP_NET_ADMIN)) {
+		rcu_read_unlock();
+		request_module("udp-ulp-%s", name);
+		rcu_read_lock();
+		ulp = udp_ulp_find(name);
+	}
+#endif
+	if (!ulp || !try_module_get(ulp->owner))
+		ulp = NULL;
+
+	rcu_read_unlock();
+	return ulp;
+}
+
+/* Attach new upper layer protocol to the list
+ * of available protocols.
+ */
+int udp_register_ulp(struct udp_ulp_ops *ulp)
+{
+	int ret = 0;
+
+	spin_lock(&udp_ulp_list_lock);
+	if (udp_ulp_find(ulp->name))
+		ret = -EEXIST;
+	else
+		list_add_tail_rcu(&ulp->list, &udp_ulp_list);
+
+	spin_unlock(&udp_ulp_list_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(udp_register_ulp);
+
+void udp_unregister_ulp(struct udp_ulp_ops *ulp)
+{
+	spin_lock(&udp_ulp_list_lock);
+	list_del_rcu(&ulp->list);
+	spin_unlock(&udp_ulp_list_lock);
+
+	synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(udp_unregister_ulp);
+
+void udp_cleanup_ulp(struct sock *sk)
+{
+	struct inet_sock *inet = inet_sk(sk);
+
+	/* No sock_owned_by_me() check here as at the time the
+	 * stack calls this function, the socket is dead and
+	 * about to be destroyed.
+	 */
+	if (!inet->udp_ulp_ops)
+		return;
+
+	if (inet->udp_ulp_ops->release)
+		inet->udp_ulp_ops->release(sk);
+	module_put(inet->udp_ulp_ops->owner);
+
+	inet->udp_ulp_ops = NULL;
+}
+
+static int __udp_set_ulp(struct sock *sk, const struct udp_ulp_ops *ulp_ops)
+{
+	struct inet_sock *inet = inet_sk(sk);
+	int err;
+
+	err = -EEXIST;
+	if (inet->udp_ulp_ops)
+		goto out_err;
+
+	err = ulp_ops->init(sk);
+	if (err)
+		goto out_err;
+
+	inet->udp_ulp_ops = ulp_ops;
+	return 0;
+
+out_err:
+	module_put(ulp_ops->owner);
+	return err;
+}
+
+int udp_set_ulp(struct sock *sk, const char *name)
+{
+	struct sk_psock *psock = sk_psock_get(sk);
+	const struct udp_ulp_ops *ulp_ops;
+
+	if (psock){
+		sk_psock_put(sk, psock);
+		return -EINVAL;
+	}
+
+	sock_owned_by_me(sk);
+	ulp_ops = __udp_ulp_find_autoload(name);
+	if (!ulp_ops)
+		return -ENOENT;
+
+	return __udp_set_ulp(sk, ulp_ops);
+}
+
+int udp_setsockopt_ulp(struct sock *sk, sockptr_t optval, unsigned int optlen)
+{
+	char name[UDP_ULP_NAME_MAX];
+	int val, err;
+
+	if (!optlen || optlen > UDP_ULP_NAME_MAX)
+		return -EINVAL;
+
+	val = strncpy_from_sockptr(name, optval, optlen);
+	if (val < 0)
+		return -EFAULT;
+
+	if (val == UDP_ULP_NAME_MAX)
+		return -EINVAL;
+
+	name[val] = 0;
+	lock_sock(sk);
+	err = udp_set_ulp(sk, name);
+	release_sock(sk);
+	return err;
+}
+
+int udp_getsockopt_ulp(struct sock *sk, char __user *optval, int __user *optlen)
+{
+	struct inet_sock *inet = inet_sk(sk);
+	int len;
+
+	if (get_user(len, optlen))
+		return -EFAULT;
+
+	len = min_t(unsigned int, len, UDP_ULP_NAME_MAX);
+	if (len < 0)
+		return -EINVAL;
+
+	if (!inet->udp_ulp_ops) {
+		if (put_user(0, optlen))
+			return -EFAULT;
+		return 0;
+	}
+
+	if (put_user(len, optlen))
+		return -EFAULT;
+	if (copy_to_user(optval, inet->udp_ulp_ops->name, len))
+		return -EFAULT;
+
+	return 0;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC net-next 4/6] net: Implement QUIC offload functions
  2022-08-01 19:52 ` [RFC net-next 0/6] net: support QUIC crypto Adel Abouchaev
                     ` (2 preceding siblings ...)
  2022-08-01 19:52   ` [RFC net-next 3/6] net: Add UDP ULP operations, initialization and handling prototype functions Adel Abouchaev
@ 2022-08-01 19:52   ` Adel Abouchaev
  2022-08-01 19:52   ` [RFC net-next 5/6] net: Add flow counters and Tx processing error counter Adel Abouchaev
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-01 19:52 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest, Adel Abouchaev

Add connection hash to the context do support add, remove operations
on QUIC connections for the control plane and lookup for the data
plane. Implement setsockopt and add placeholders to add and delete Tx
connections.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>
---
 include/net/quic.h   |   49 ++
 net/ipv4/udp.c       |    8 +
 net/quic/Kconfig     |   16 +
 net/quic/Makefile    |    8 +
 net/quic/quic_main.c | 1400 ++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 1481 insertions(+)
 create mode 100644 include/net/quic.h
 create mode 100644 net/quic/Kconfig
 create mode 100644 net/quic/Makefile
 create mode 100644 net/quic/quic_main.c

diff --git a/include/net/quic.h b/include/net/quic.h
new file mode 100644
index 000000000000..15e04ea08c53
--- /dev/null
+++ b/include/net/quic.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifndef INCLUDE_NET_QUIC_H
+#define INCLUDE_NET_QUIC_H
+
+#include <linux/mutex.h>
+#include <linux/rhashtable.h>
+#include <linux/skmsg.h>
+#include <uapi/linux/quic.h>
+
+#define QUIC_MAX_SHORT_HEADER_SIZE      25
+#define QUIC_MAX_CONNECTION_ID_SIZE     20
+#define QUIC_HDR_MASK_SIZE              16
+#define QUIC_MAX_GSO_FRAGS              16
+
+// Maximum IV and nonce sizes should be in sync with supported ciphers.
+#define QUIC_CIPHER_MAX_IV_SIZE		12
+#define QUIC_CIPHER_MAX_NONCE_SIZE	16
+
+/* Side by side data for QUIC egress operations */
+#define QUIC_ANCILLARY_FLAGS    (QUIC_BYPASS_ENCRYPTION)
+
+#define QUIC_MAX_IOVEC_SEGMENTS		8
+#define QUIC_MAX_SG_ALLOC_ELEMENTS	32
+#define QUIC_MAX_PLAIN_PAGES		16
+#define QUIC_MAX_CIPHER_PAGES_ORDER	4
+
+struct quic_internal_crypto_context {
+	struct quic_connection_info	conn_info;
+	struct crypto_skcipher		*header_tfm;
+	struct crypto_aead		*packet_aead;
+};
+
+struct quic_connection_rhash {
+	struct rhash_head			node;
+	struct quic_internal_crypto_context	crypto_ctx;
+	struct rcu_head				rcu;
+};
+
+struct quic_context {
+	struct proto		*sk_proto;
+	struct rhashtable	tx_connections;
+	struct scatterlist	sg_alloc[QUIC_MAX_SG_ALLOC_ELEMENTS];
+	struct page		*cipher_page;
+	struct mutex		sendmsg_mux;
+	struct rcu_head		rcu;
+};
+
+#endif
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index e4a5f66b3141..d14379b78e42 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -113,6 +113,7 @@
 #include <net/sock_reuseport.h>
 #include <net/addrconf.h>
 #include <net/udp_tunnel.h>
+#include <uapi/linux/quic.h>
 #if IS_ENABLED(CONFIG_IPV6)
 #include <net/ipv6_stubs.h>
 #endif
@@ -1009,6 +1010,13 @@ static int __udp_cmsg_send(struct cmsghdr *cmsg, u16 *gso_size)
 			return -EINVAL;
 		*gso_size = *(__u16 *)CMSG_DATA(cmsg);
 		return 0;
+	case UDP_QUIC_ENCRYPT:
+		/* This option is handled in UDP_ULP and is only checked
+		 * here for the bypass bit
+		 */
+		if (cmsg->cmsg_len != CMSG_LEN(sizeof(struct quic_tx_ancillary_data)))
+			return -EINVAL;
+		return 0;
 	default:
 		return -EINVAL;
 	}
diff --git a/net/quic/Kconfig b/net/quic/Kconfig
new file mode 100644
index 000000000000..661cb989508a
--- /dev/null
+++ b/net/quic/Kconfig
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# QUIC configuration
+#
+config QUIC
+	tristate "QUIC encryption offload"
+	depends on INET
+	select CRYPTO
+	select CRYPTO_AES
+	select CRYPTO_GCM
+	help
+	Enable kernel support for QUIC crypto offload. Currently only TX
+	encryption offload is supported. The kernel will perform
+	copy-during-encryption.
+
+	If unsure, say N.
diff --git a/net/quic/Makefile b/net/quic/Makefile
new file mode 100644
index 000000000000..928239c4d08c
--- /dev/null
+++ b/net/quic/Makefile
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for the QUIC subsystem
+#
+
+obj-$(CONFIG_QUIC) += quic.o
+
+quic-y := quic_main.o
diff --git a/net/quic/quic_main.c b/net/quic/quic_main.c
new file mode 100644
index 000000000000..e738c8130a4f
--- /dev/null
+++ b/net/quic/quic_main.c
@@ -0,0 +1,1400 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <crypto/skcipher.h>
+#include <linux/bug.h>
+#include <linux/module.h>
+#include <linux/rhashtable.h>
+// Include header to use TLS constants for AEAD cipher.
+#include <net/tls.h>
+#include <net/quic.h>
+#include <net/udp.h>
+#include <uapi/linux/quic.h>
+
+static unsigned long af_init_done;
+static struct proto quic_v4_proto;
+static struct proto quic_v6_proto;
+static DEFINE_SPINLOCK(quic_proto_lock);
+
+static u32 quic_tx_connection_hash(const void *data, u32 len, u32 seed)
+{
+	return jhash(data, len, seed);
+}
+
+static u32 quic_tx_connection_hash_obj(const void *data, u32 len, u32 seed)
+{
+	const struct quic_connection_rhash *connhash = data;
+
+	return jhash(&connhash->crypto_ctx.conn_info.key,
+		     sizeof(struct quic_connection_info_key), seed);
+}
+
+static int quic_tx_connection_hash_cmp(struct rhashtable_compare_arg *arg,
+				       const void *ptr)
+{
+	const struct quic_connection_info_key *key = arg->key;
+	const struct quic_connection_rhash *x = ptr;
+
+	return !!memcmp(&x->crypto_ctx.conn_info.key,
+			key,
+			sizeof(struct quic_connection_info_key));
+}
+
+static const struct rhashtable_params quic_tx_connection_params = {
+	.key_len		= sizeof(struct quic_connection_info_key),
+	.head_offset		= offsetof(struct quic_connection_rhash, node),
+	.hashfn			= quic_tx_connection_hash,
+	.obj_hashfn		= quic_tx_connection_hash_obj,
+	.obj_cmpfn		= quic_tx_connection_hash_cmp,
+	.automatic_shrinking	= true,
+};
+
+static inline size_t quic_crypto_key_size(u16 cipher_type)
+{
+	switch (cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		return TLS_CIPHER_AES_GCM_128_KEY_SIZE;
+	case TLS_CIPHER_AES_GCM_256:
+		return TLS_CIPHER_AES_GCM_256_KEY_SIZE;
+	case TLS_CIPHER_AES_CCM_128:
+		return TLS_CIPHER_AES_CCM_128_KEY_SIZE;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		return TLS_CIPHER_CHACHA20_POLY1305_KEY_SIZE;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return 0;
+}
+
+static inline size_t quic_crypto_tag_size(u16 cipher_type)
+{
+	switch (cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		return TLS_CIPHER_AES_GCM_128_TAG_SIZE;
+	case TLS_CIPHER_AES_GCM_256:
+		return TLS_CIPHER_AES_GCM_256_TAG_SIZE;
+	case TLS_CIPHER_AES_CCM_128:
+		return TLS_CIPHER_AES_CCM_128_TAG_SIZE;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		return TLS_CIPHER_CHACHA20_POLY1305_TAG_SIZE;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return 0;
+}
+
+static inline size_t quic_crypto_iv_size(u16 cipher_type)
+{
+	switch (cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		BUILD_BUG_ON(TLS_CIPHER_AES_GCM_128_IV_SIZE
+			     > QUIC_CIPHER_MAX_IV_SIZE);
+		return TLS_CIPHER_AES_GCM_128_IV_SIZE;
+	case TLS_CIPHER_AES_GCM_256:
+		BUILD_BUG_ON(TLS_CIPHER_AES_GCM_256_IV_SIZE
+			     > QUIC_CIPHER_MAX_IV_SIZE);
+		return TLS_CIPHER_AES_GCM_256_IV_SIZE;
+	case TLS_CIPHER_AES_CCM_128:
+		BUILD_BUG_ON(TLS_CIPHER_AES_CCM_128_IV_SIZE
+			     > QUIC_CIPHER_MAX_IV_SIZE);
+		return TLS_CIPHER_AES_CCM_128_IV_SIZE;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		BUILD_BUG_ON(TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE
+			     > QUIC_CIPHER_MAX_IV_SIZE);
+		return TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return 0;
+}
+
+static inline size_t quic_crypto_nonce_size(u16 cipher_type)
+{
+	switch (cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		BUILD_BUG_ON(TLS_CIPHER_AES_GCM_128_IV_SIZE +
+			     TLS_CIPHER_AES_GCM_128_SALT_SIZE >
+			     QUIC_CIPHER_MAX_NONCE_SIZE);
+		return TLS_CIPHER_AES_GCM_128_IV_SIZE +
+		       TLS_CIPHER_AES_GCM_128_SALT_SIZE;
+	case TLS_CIPHER_AES_GCM_256:
+		BUILD_BUG_ON(TLS_CIPHER_AES_GCM_256_IV_SIZE +
+			     TLS_CIPHER_AES_GCM_256_SALT_SIZE >
+			     QUIC_CIPHER_MAX_NONCE_SIZE);
+		return TLS_CIPHER_AES_GCM_256_IV_SIZE +
+		       TLS_CIPHER_AES_GCM_256_SALT_SIZE;
+	case TLS_CIPHER_AES_CCM_128:
+		BUILD_BUG_ON(TLS_CIPHER_AES_CCM_128_IV_SIZE +
+			     TLS_CIPHER_AES_CCM_128_SALT_SIZE >
+			     QUIC_CIPHER_MAX_NONCE_SIZE);
+		return TLS_CIPHER_AES_CCM_128_IV_SIZE +
+		       TLS_CIPHER_AES_CCM_128_SALT_SIZE;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		BUILD_BUG_ON(TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE +
+			     TLS_CIPHER_CHACHA20_POLY1305_SALT_SIZE >
+			     QUIC_CIPHER_MAX_NONCE_SIZE);
+		return TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE +
+		       TLS_CIPHER_CHACHA20_POLY1305_SALT_SIZE;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return 0;
+}
+
+static inline
+u8 *quic_payload_iv(struct quic_internal_crypto_context *crypto_ctx)
+{
+	switch (crypto_ctx->conn_info.cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		return crypto_ctx->conn_info.aes_gcm_128.payload_iv;
+	case TLS_CIPHER_AES_GCM_256:
+		return crypto_ctx->conn_info.aes_gcm_256.payload_iv;
+	case TLS_CIPHER_AES_CCM_128:
+		return crypto_ctx->conn_info.aes_ccm_128.payload_iv;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		return crypto_ctx->conn_info.chacha20_poly1305.payload_iv;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return 0;
+}
+
+static int
+quic_config_header_crypto(struct quic_internal_crypto_context *crypto_ctx)
+{
+	struct crypto_skcipher *tfm;
+	char *header_cipher;
+	int rc = 0;
+	char *key;
+
+	switch (crypto_ctx->conn_info.cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		header_cipher = "ecb(aes)";
+		key = crypto_ctx->conn_info.aes_gcm_128.header_key;
+		break;
+	case TLS_CIPHER_AES_GCM_256:
+		header_cipher = "ecb(aes)";
+		key = crypto_ctx->conn_info.aes_gcm_256.header_key;
+		break;
+	case TLS_CIPHER_AES_CCM_128:
+		header_cipher = "ecb(aes)";
+		key = crypto_ctx->conn_info.aes_ccm_128.header_key;
+		break;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		header_cipher = "chacha20";
+		key = crypto_ctx->conn_info.chacha20_poly1305.header_key;
+		break;
+	default:
+		rc = -EINVAL;
+		goto out;
+	}
+
+	tfm = crypto_alloc_skcipher(header_cipher, 0, 0);
+	if (IS_ERR(tfm)) {
+		rc = PTR_ERR(tfm);
+		goto out;
+	}
+
+	rc = crypto_skcipher_setkey(tfm, key,
+				    quic_crypto_key_size(crypto_ctx->conn_info
+							 .cipher_type));
+	if (rc) {
+		crypto_free_skcipher(tfm);
+		goto out;
+	}
+
+	crypto_ctx->header_tfm = tfm;
+
+out:
+	return rc;
+}
+
+static int
+quic_config_packet_crypto(struct quic_internal_crypto_context *crypto_ctx)
+{
+	struct crypto_aead *aead;
+	char *cipher_name;
+	int rc = 0;
+	char *key;
+
+	switch (crypto_ctx->conn_info.cipher_type) {
+	case TLS_CIPHER_AES_GCM_128: {
+		key = crypto_ctx->conn_info.aes_gcm_128.payload_key;
+		cipher_name = "gcm(aes)";
+		break;
+	}
+	case TLS_CIPHER_AES_GCM_256: {
+		key = crypto_ctx->conn_info.aes_gcm_256.payload_key;
+		cipher_name = "gcm(aes)";
+		break;
+	}
+	case TLS_CIPHER_AES_CCM_128: {
+		key = crypto_ctx->conn_info.aes_ccm_128.payload_key;
+		cipher_name = "ccm(aes)";
+		break;
+	}
+	case TLS_CIPHER_CHACHA20_POLY1305: {
+		key = crypto_ctx->conn_info.chacha20_poly1305.payload_key;
+		cipher_name = "rfc7539(chacha20,poly1305)";
+		break;
+	}
+	default:
+		rc = -EINVAL;
+		goto out;
+	}
+
+	aead = crypto_alloc_aead(cipher_name, 0, 0);
+	if (IS_ERR(aead)) {
+		rc = PTR_ERR(aead);
+		goto out;
+	}
+
+	rc = crypto_aead_setkey(aead, key,
+				quic_crypto_key_size(crypto_ctx->conn_info
+						     .cipher_type));
+	if (rc)
+		goto free_aead;
+
+	rc = crypto_aead_setauthsize(aead,
+				     quic_crypto_tag_size(crypto_ctx->conn_info
+							  .cipher_type));
+	if (rc)
+		goto free_aead;
+
+	crypto_ctx->packet_aead = aead;
+	goto out;
+
+free_aead:
+	crypto_free_aead(aead);
+
+out:
+	return rc;
+}
+
+static inline struct quic_context *quic_get_ctx(struct sock *sk)
+{
+	struct inet_sock *inet = inet_sk(sk);
+
+	return (__force void *)rcu_access_pointer(inet->ulp_data);
+}
+
+static void quic_free_cipher_page(struct page *page)
+{
+	__free_pages(page, QUIC_MAX_CIPHER_PAGES_ORDER);
+}
+
+static struct quic_context *quic_ctx_create(void)
+{
+	struct quic_context *ctx;
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return NULL;
+
+	mutex_init(&ctx->sendmsg_mux);
+	ctx->cipher_page = alloc_pages(GFP_KERNEL, QUIC_MAX_CIPHER_PAGES_ORDER);
+	if (!ctx->cipher_page)
+		goto out_err;
+
+	if (rhashtable_init(&ctx->tx_connections,
+			    &quic_tx_connection_params) < 0) {
+		quic_free_cipher_page(ctx->cipher_page);
+		goto out_err;
+	}
+
+	return ctx;
+
+out_err:
+	kfree(ctx);
+	return NULL;
+}
+
+static int quic_getsockopt(struct sock *sk, int level, int optname,
+			   char __user *optval, int __user *optlen)
+{
+	struct quic_context *ctx = quic_get_ctx(sk);
+
+	return ctx->sk_proto->getsockopt(sk, level, optname, optval, optlen);
+}
+
+static int do_quic_conn_add_tx(struct sock *sk, sockptr_t optval,
+			       unsigned int optlen)
+{
+	struct quic_internal_crypto_context *crypto_ctx;
+	struct quic_context *ctx = quic_get_ctx(sk);
+	struct quic_connection_rhash *connhash;
+	int rc = 0;
+
+	if (sockptr_is_null(optval))
+		return -EINVAL;
+
+	if (optlen != sizeof(struct quic_connection_info))
+		return -EINVAL;
+
+	connhash = kzalloc(sizeof(*connhash), GFP_KERNEL);
+	if (!connhash)
+		return -EFAULT;
+
+	crypto_ctx = &connhash->crypto_ctx;
+	rc = copy_from_sockptr(&crypto_ctx->conn_info, optval,
+			       sizeof(crypto_ctx->conn_info));
+	if (rc) {
+		rc = -EFAULT;
+		goto err_crypto_info;
+	}
+
+	// create all TLS materials for packet and header decryption
+	rc = quic_config_header_crypto(crypto_ctx);
+	if (rc)
+		goto err_crypto_info;
+
+	rc = quic_config_packet_crypto(crypto_ctx);
+	if (rc)
+		goto err_free_skcipher;
+
+	// insert crypto data into hash per connection ID
+	rc = rhashtable_insert_fast(&ctx->tx_connections, &connhash->node,
+				    quic_tx_connection_params);
+	if (rc < 0)
+		goto err_free_ciphers;
+
+	return 0;
+
+err_free_ciphers:
+	crypto_free_aead(crypto_ctx->packet_aead);
+
+err_free_skcipher:
+	crypto_free_skcipher(crypto_ctx->header_tfm);
+
+err_crypto_info:
+	// wipeout all crypto materials;
+	memzero_explicit(&connhash->crypto_ctx, sizeof(connhash->crypto_ctx));
+	kfree(connhash);
+	return rc;
+}
+
+static int do_quic_conn_del_tx(struct sock *sk, sockptr_t optval,
+			       unsigned int optlen)
+{
+	struct quic_internal_crypto_context *crypto_ctx;
+	struct quic_context *ctx = quic_get_ctx(sk);
+	struct quic_connection_rhash *connhash;
+	struct quic_connection_info conn_info;
+
+	if (sockptr_is_null(optval))
+		return -EINVAL;
+
+	if (optlen != sizeof(struct quic_connection_info))
+		return -EINVAL;
+
+	if (copy_from_sockptr(&conn_info, optval, optlen))
+		return -EFAULT;
+
+	connhash = rhashtable_lookup_fast(&ctx->tx_connections,
+					  &conn_info.key,
+					  quic_tx_connection_params);
+	if (!connhash)
+		return -EINVAL;
+
+	rhashtable_remove_fast(&ctx->tx_connections,
+			       &connhash->node,
+			       quic_tx_connection_params);
+
+
+	crypto_ctx = &connhash->crypto_ctx;
+
+	crypto_free_skcipher(crypto_ctx->header_tfm);
+	crypto_free_aead(crypto_ctx->packet_aead);
+	memzero_explicit(crypto_ctx, sizeof(*crypto_ctx));
+	kfree(connhash);
+
+	return 0;
+}
+
+static int do_quic_setsockopt(struct sock *sk, int optname, sockptr_t optval,
+			      unsigned int optlen)
+{
+	int rc = 0;
+
+	switch (optname) {
+	case UDP_QUIC_ADD_TX_CONNECTION:
+		lock_sock(sk);
+		rc = do_quic_conn_add_tx(sk, optval, optlen);
+		release_sock(sk);
+		break;
+	case UDP_QUIC_DEL_TX_CONNECTION:
+		lock_sock(sk);
+		rc = do_quic_conn_del_tx(sk, optval, optlen);
+		release_sock(sk);
+		break;
+	default:
+		rc = -ENOPROTOOPT;
+		break;
+	}
+
+	return rc;
+}
+
+static int quic_setsockopt(struct sock *sk, int level, int optname,
+			   sockptr_t optval, unsigned int optlen)
+{
+	struct quic_context *ctx;
+	struct proto *sk_proto;
+
+	rcu_read_lock();
+	ctx = quic_get_ctx(sk);
+	sk_proto = ctx->sk_proto;
+	rcu_read_unlock();
+
+	if (level == SOL_UDP &&
+	    (optname == UDP_QUIC_ADD_TX_CONNECTION ||
+	     optname == UDP_QUIC_DEL_TX_CONNECTION))
+		return do_quic_setsockopt(sk, optname, optval, optlen);
+
+	return sk_proto->setsockopt(sk, level, optname, optval, optlen);
+}
+
+static int
+quic_extract_ancillary_data(struct msghdr *msg,
+			    struct quic_tx_ancillary_data *ancillary_data,
+			    u16 *udp_pkt_size)
+{
+	struct cmsghdr *cmsg_hdr = 0;
+	void *ancillary_data_ptr = 0;
+
+	if (!msg->msg_controllen)
+		return -EINVAL;
+
+	for_each_cmsghdr(cmsg_hdr, msg) {
+		if (!CMSG_OK(msg, cmsg_hdr))
+			return -EINVAL;
+
+		if (cmsg_hdr->cmsg_level != IPPROTO_UDP)
+			continue;
+
+		if (cmsg_hdr->cmsg_type == UDP_QUIC_ENCRYPT) {
+			if (cmsg_hdr->cmsg_len !=
+			    CMSG_LEN(sizeof(struct quic_tx_ancillary_data)))
+				return -EINVAL;
+			memcpy((void *)ancillary_data, CMSG_DATA(cmsg_hdr),
+			       sizeof(struct quic_tx_ancillary_data));
+			ancillary_data_ptr = cmsg_hdr;
+		} else if (cmsg_hdr->cmsg_type == UDP_SEGMENT) {
+			if (cmsg_hdr->cmsg_len != CMSG_LEN(sizeof(u16)))
+				return -EINVAL;
+			memcpy((void *)udp_pkt_size, CMSG_DATA(cmsg_hdr),
+			       sizeof(u16));
+		}
+	}
+
+	if (!ancillary_data_ptr)
+		return -EINVAL;
+
+	return 0;
+}
+
+static int quic_sendmsg_validate(struct msghdr *msg)
+{
+	if (!iter_is_iovec(&msg->msg_iter))
+		return -EINVAL;
+
+	if (!msg->msg_controllen)
+		return -EINVAL;
+
+	return 0;
+}
+
+static struct quic_connection_rhash
+*quic_lookup_connection(struct quic_context *ctx,
+			u8 *conn_id,
+			struct quic_tx_ancillary_data *ancillary_data)
+{
+	struct quic_connection_info_key conn_key;
+
+	// Lookup connection information by the connection key.
+	memset(&conn_key, 0, sizeof(struct quic_connection_info_key));
+	// fill the connection id up to the max connection ID length
+	if (ancillary_data->conn_id_length > QUIC_MAX_CONNECTION_ID_SIZE)
+		return NULL;
+
+	conn_key.conn_id_length = ancillary_data->conn_id_length;
+	if (ancillary_data->conn_id_length)
+		memcpy(conn_key.conn_id,
+		       conn_id,
+		       ancillary_data->conn_id_length);
+	return rhashtable_lookup_fast(&ctx->tx_connections,
+				      &conn_key,
+				      quic_tx_connection_params);
+}
+
+static int quic_sg_capacity_from_msg(const size_t pkt_size,
+				     const off_t offset,
+				     const size_t length)
+{
+	size_t	pages = 0;
+	size_t	pkts = 0;
+
+	pages = DIV_ROUND_UP(offset + length, PAGE_SIZE);
+	pkts = DIV_ROUND_UP(length, pkt_size);
+	return pages + pkts + 1;
+}
+
+static void quic_put_plain_user_pages(struct page **pages, size_t nr_pages)
+{
+	int i;
+
+	for (i = 0; i < nr_pages; ++i)
+		if (i == 0 || pages[i] != pages[i - 1])
+			put_page(pages[i]);
+}
+
+static int quic_get_plain_user_pages(struct msghdr * const msg,
+				     struct page **pages,
+				     int *page_indices)
+{
+	size_t	nr_mapped = 0;
+	size_t	nr_pages = 0;
+	void	*data_addr;
+	void	*page_addr;
+	size_t	count = 0;
+	off_t	data_off;
+	int	ret = 0;
+	int	i;
+
+	for (i = 0; i < msg->msg_iter.nr_segs; ++i) {
+		data_addr = msg->msg_iter.iov[i].iov_base;
+		if (!i)
+			data_addr += msg->msg_iter.iov_offset;
+		page_addr =
+			(void *)((unsigned long)data_addr & PAGE_MASK);
+
+		data_off = (unsigned long)data_addr & ~PAGE_MASK;
+		nr_pages =
+			DIV_ROUND_UP(data_off + msg->msg_iter.iov[i].iov_len,
+				     PAGE_SIZE);
+		if (nr_mapped + nr_pages > QUIC_MAX_PLAIN_PAGES) {
+			quic_put_plain_user_pages(pages, nr_mapped);
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		count = get_user_pages((unsigned long)page_addr, nr_pages, 1,
+				       pages, NULL);
+		if (count < nr_pages) {
+			quic_put_plain_user_pages(pages, nr_mapped + count);
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		page_indices[i] = nr_mapped;
+		nr_mapped += count;
+		pages += count;
+	}
+	ret = nr_mapped;
+
+out:
+	return ret;
+}
+
+static int quic_sg_plain_from_mapped_msg(struct msghdr * const msg,
+					 struct page **plain_pages,
+					 void **iov_base_ptrs,
+					 void **iov_data_ptrs,
+					 const size_t plain_size,
+					 const size_t pkt_size,
+					 struct scatterlist * const sg_alloc,
+					 const size_t max_sg_alloc,
+					 struct scatterlist ** const sg_pkts,
+					 size_t *nr_plain_pages)
+{
+	int iov_page_indices[QUIC_MAX_IOVEC_SEGMENTS];
+	struct scatterlist *sg;
+	unsigned int pkt_i = 0;
+	ssize_t left_on_page;
+	size_t pkt_left;
+	unsigned int i;
+	size_t seg_len;
+	off_t page_ofs;
+	off_t seg_ofs;
+	int ret = 0;
+	int page_i;
+
+	if (msg->msg_iter.nr_segs >= QUIC_MAX_IOVEC_SEGMENTS) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ret = quic_get_plain_user_pages(msg, plain_pages, iov_page_indices);
+	if (ret < 0)
+		goto out;
+
+	*nr_plain_pages = ret;
+	sg = sg_alloc;
+	sg_pkts[pkt_i] = sg;
+	sg_unmark_end(sg);
+	pkt_left = pkt_size;
+	for (i = 0; i < msg->msg_iter.nr_segs; ++i) {
+		page_ofs = ((unsigned long)msg->msg_iter.iov[i].iov_base
+			   & (PAGE_SIZE - 1));
+		page_i = 0;
+		if (!i) {
+			page_ofs += msg->msg_iter.iov_offset;
+			while (page_ofs >= PAGE_SIZE) {
+				page_ofs -= PAGE_SIZE;
+				page_i++;
+			}
+		}
+
+		seg_len = msg->msg_iter.iov[i].iov_len;
+		page_i += iov_page_indices[i];
+
+		if (page_i >= QUIC_MAX_PLAIN_PAGES)
+			return -EFAULT;
+
+		seg_ofs = 0;
+		while (seg_ofs < seg_len) {
+			if (sg - sg_alloc > max_sg_alloc)
+				return -EFAULT;
+
+			sg_unmark_end(sg);
+			left_on_page = min_t(size_t, PAGE_SIZE - page_ofs,
+					     seg_len - seg_ofs);
+			if (left_on_page <= 0)
+				return -EFAULT;
+
+			if (left_on_page > pkt_left) {
+				sg_set_page(sg, plain_pages[page_i], pkt_left,
+					    page_ofs);
+				pkt_i++;
+				seg_ofs += pkt_left;
+				page_ofs += pkt_left;
+				sg_mark_end(sg);
+				sg++;
+				sg_pkts[pkt_i] = sg;
+				pkt_left = pkt_size;
+				continue;
+			}
+			sg_set_page(sg, plain_pages[page_i], left_on_page,
+				    page_ofs);
+			page_i++;
+			page_ofs = 0;
+			seg_ofs += left_on_page;
+			pkt_left -= left_on_page;
+			if (pkt_left == 0 ||
+			    (seg_ofs == seg_len &&
+			     i == msg->msg_iter.nr_segs - 1)) {
+				sg_mark_end(sg);
+				pkt_i++;
+				sg++;
+				sg_pkts[pkt_i] = sg;
+				pkt_left = pkt_size;
+			} else {
+				sg++;
+			}
+		}
+	}
+
+	if (pkt_left && pkt_left != pkt_size) {
+		pkt_i++;
+		sg_mark_end(sg);
+	}
+	ret = pkt_i;
+
+out:
+	return ret;
+}
+
+/* sg_alloc: allocated zeroed array of scatterlists
+ * cipher_page: preallocated compound page
+ */
+static int quic_sg_cipher_from_pkts(const size_t cipher_tag_size,
+				     const size_t plain_pkt_size,
+				     const size_t plain_size,
+				     struct page * const cipher_page,
+				     struct scatterlist * const sg_alloc,
+				     const size_t nr_sg_alloc,
+				     struct scatterlist ** const sg_cipher)
+{
+	const size_t cipher_pkt_size = plain_pkt_size + cipher_tag_size;
+	size_t pkts = DIV_ROUND_UP(plain_size, plain_pkt_size);
+	struct scatterlist *sg = sg_alloc;
+	int pkt_i;
+	void *ptr;
+
+	if (pkts > nr_sg_alloc)
+		return -EINVAL;
+
+	ptr = page_address(cipher_page);
+	for (pkt_i = 0; pkt_i < pkts;
+		++pkt_i, ptr += cipher_pkt_size, ++sg) {
+		sg_set_buf(sg, ptr, cipher_pkt_size);
+		sg_mark_end(sg);
+		sg_cipher[pkt_i] = sg;
+	}
+	return pkts;
+}
+
+/* fast copy from scatterlist to a buffer assuming that all pages are
+ * available in kernel memory.
+ */
+static int quic_sg_pcopy_to_buffer_kernel(struct scatterlist *sg,
+					  u8 *buffer,
+					  size_t bytes_to_copy,
+					  off_t offset_to_read)
+{
+	off_t sg_remain = sg->length;
+	size_t to_copy;
+
+	if (!bytes_to_copy)
+		return 0;
+
+	// skip to offset first
+	while (offset_to_read > 0) {
+		if (!sg_remain)
+			return -EINVAL;
+		if (offset_to_read < sg_remain) {
+			sg_remain -= offset_to_read;
+			break;
+		}
+		offset_to_read -= sg_remain;
+		sg = sg_next(sg);
+		if (!sg)
+			return -EINVAL;
+		sg_remain = sg->length;
+	}
+
+	// traverse sg list from offset to offset + bytes_to_copy
+	while (bytes_to_copy) {
+		to_copy = min_t(size_t, bytes_to_copy, sg_remain);
+		if (!to_copy)
+			return -EINVAL;
+		memcpy(buffer, sg_virt(sg) + (sg->length - sg_remain), to_copy);
+		buffer += to_copy;
+		bytes_to_copy -= to_copy;
+		if (bytes_to_copy) {
+			sg = sg_next(sg);
+			if (!sg)
+				return -EINVAL;
+			sg_remain = sg->length;
+		}
+	}
+
+	return 0;
+}
+
+static int quic_copy_header(struct scatterlist *sg_plain,
+			    u8 *buf, const size_t buf_len,
+			    const size_t conn_id_len)
+{
+	u8 *pkt = sg_virt(sg_plain);
+	size_t hdr_len;
+
+	hdr_len = 1 + conn_id_len + ((*pkt & 0x03) + 1);
+	if (hdr_len > QUIC_MAX_SHORT_HEADER_SIZE || hdr_len > buf_len)
+		return -EINVAL;
+
+	WARN_ON_ONCE(quic_sg_pcopy_to_buffer_kernel(sg_plain, buf, hdr_len, 0));
+	return hdr_len;
+}
+
+static u64 quic_unpack_pkt_num(struct quic_tx_ancillary_data * const control,
+			       const u8 * const hdr,
+			       const off_t payload_crypto_off)
+{
+	u64 truncated_pn = 0;
+	u64 candidate_pn;
+	u64 expected_pn;
+	u64 pn_hwin;
+	u64 pn_mask;
+	u64 pn_len;
+	u64 pn_win;
+	int i;
+
+	pn_len = (hdr[0] & 0x03) + 1;
+	expected_pn = control->next_pkt_num;
+
+	for (i = 1 + control->conn_id_length; i < payload_crypto_off; ++i) {
+		truncated_pn <<= 8;
+		truncated_pn |= hdr[i];
+	}
+
+	pn_win = 1ULL << (pn_len << 3);
+	pn_hwin = pn_win >> 1;
+	pn_mask = pn_win - 1;
+	candidate_pn = (expected_pn & ~pn_mask) | truncated_pn;
+
+	if (expected_pn > pn_hwin &&
+	    candidate_pn <= expected_pn - pn_hwin &&
+	    candidate_pn < (1ULL << 62) - pn_win)
+		return candidate_pn + pn_win;
+
+	if (candidate_pn > expected_pn + pn_hwin &&
+	    candidate_pn >= pn_win)
+		return candidate_pn - pn_win;
+
+	return candidate_pn;
+}
+
+static int
+quic_construct_header_prot_mask(struct quic_internal_crypto_context *crypto_ctx,
+				struct skcipher_request *hdr_mask_req,
+				struct scatterlist *sg_cipher_pkt,
+				off_t sample_offset,
+				u8 *hdr_mask)
+{
+	u8 *sample = sg_virt(sg_cipher_pkt) + sample_offset;
+	u8 hdr_ctr[sizeof(u32) + QUIC_CIPHER_MAX_IV_SIZE];
+	struct scatterlist sg_cipher_sample;
+	struct scatterlist sg_hdr_mask;
+	struct crypto_wait wait_header;
+	u32	counter;
+
+	BUILD_BUG_ON(QUIC_HDR_MASK_SIZE
+		     < sizeof(u32) + QUIC_CIPHER_MAX_IV_SIZE);
+
+	// cipher pages are continuous, get the pointer to the sg data directly,
+	// page is allocated in kernel
+	sg_init_one(&sg_cipher_sample, sample, QUIC_HDR_MASK_SIZE);
+	sg_init_one(&sg_hdr_mask, hdr_mask, QUIC_HDR_MASK_SIZE);
+	skcipher_request_set_callback(hdr_mask_req, 0, crypto_req_done,
+				      &wait_header);
+
+	if (crypto_ctx->conn_info.cipher_type == TLS_CIPHER_CHACHA20_POLY1305) {
+		counter = cpu_to_le32(*((u32 *)sample));
+		memset(hdr_ctr, 0, sizeof(hdr_ctr));
+		memcpy((u8 *)hdr_ctr, (u8 *)&counter, sizeof(u32));
+		memcpy((u8 *)hdr_ctr + sizeof(u32),
+		       (sample + sizeof(u32)),
+		       QUIC_CIPHER_MAX_IV_SIZE);
+		skcipher_request_set_crypt(hdr_mask_req, &sg_cipher_sample,
+					   &sg_hdr_mask, 5, hdr_ctr);
+	} else {
+		skcipher_request_set_crypt(hdr_mask_req, &sg_cipher_sample,
+					   &sg_hdr_mask, QUIC_HDR_MASK_SIZE,
+					   NULL);
+	}
+
+	return crypto_wait_req(crypto_skcipher_encrypt(hdr_mask_req),
+			       &wait_header);
+}
+
+static int quic_protect_header(struct quic_internal_crypto_context *crypto_ctx,
+			       struct quic_tx_ancillary_data *control,
+			       struct skcipher_request *hdr_mask_req,
+			       struct scatterlist *sg_cipher_pkt,
+			       int payload_crypto_off)
+{
+	u8 hdr_mask[QUIC_HDR_MASK_SIZE];
+	off_t quic_pkt_num_off;
+	u8 quic_pkt_num_len;
+	u8 *cipher_hdr;
+	int err;
+	int i;
+
+	quic_pkt_num_off = 1 + control->conn_id_length;
+	quic_pkt_num_len = payload_crypto_off - quic_pkt_num_off;
+
+	if (quic_pkt_num_len > 4)
+		return -EPERM;
+
+	err = quic_construct_header_prot_mask(crypto_ctx, hdr_mask_req,
+					      sg_cipher_pkt,
+					      payload_crypto_off +
+					      (4 - quic_pkt_num_len),
+					      hdr_mask);
+	if (unlikely(err))
+		return err;
+
+	cipher_hdr = sg_virt(sg_cipher_pkt);
+	// protect the public flags
+	cipher_hdr[0] ^= (hdr_mask[0] & 0x1f);
+
+	for (i = 0; i < quic_pkt_num_len; ++i)
+		cipher_hdr[quic_pkt_num_off + i] ^= hdr_mask[1 + i];
+
+	return 0;
+}
+
+static
+void quic_construct_ietf_nonce(u8 *nonce,
+			       struct quic_internal_crypto_context *crypto_ctx,
+			       u64 quic_pkt_num)
+{
+	u8 *iv = quic_payload_iv(crypto_ctx);
+	int i;
+
+	for (i = quic_crypto_nonce_size(crypto_ctx->conn_info.cipher_type) - 1;
+	     i >= 0 && quic_pkt_num;
+	     --i, quic_pkt_num >>= 8)
+		nonce[i] = iv[i] ^ (u8)quic_pkt_num;
+
+	for (; i >= 0; --i)
+		nonce[i] = iv[i];
+}
+
+ssize_t quic_sendpage(struct quic_context *ctx,
+		      struct sock *sk,
+		      struct msghdr *msg,
+		      const size_t cipher_size,
+		      struct page * const cipher_page)
+{
+	struct kvec iov;
+	ssize_t ret;
+
+	iov.iov_base = page_address(cipher_page);
+	iov.iov_len = cipher_size;
+	iov_iter_kvec(&msg->msg_iter, WRITE | ITER_KVEC, &iov, 1,
+		      cipher_size);
+	ret = security_socket_sendmsg(sk->sk_socket, msg, msg_data_left(msg));
+	if (ret)
+		return ret;
+
+	ret = ctx->sk_proto->sendmsg(sk, msg, msg_data_left(msg));
+	WARN_ON(ret == -EIOCBQUEUED);
+	return ret;
+}
+
+static int quic_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
+{
+	struct quic_internal_crypto_context *crypto_ctx = NULL;
+	struct scatterlist *sg_cipher_pkts[QUIC_MAX_GSO_FRAGS];
+	struct scatterlist *sg_plain_pkts[QUIC_MAX_GSO_FRAGS];
+	struct page *plain_pages[QUIC_MAX_PLAIN_PAGES];
+	void *plain_base_ptrs[QUIC_MAX_IOVEC_SEGMENTS];
+	void *plain_data_ptrs[QUIC_MAX_IOVEC_SEGMENTS];
+	struct msghdr msg_cipher = {
+		.msg_name = msg->msg_name,
+		.msg_namelen = msg->msg_namelen,
+		.msg_flags = msg->msg_flags,
+		.msg_control = msg->msg_control,
+		.msg_controllen = msg->msg_controllen,
+	};
+	struct quic_connection_rhash *connhash = NULL;
+	struct quic_connection_info *conn_info = NULL;
+	struct quic_context *ctx = quic_get_ctx(sk);
+	u8 hdr_buf[QUIC_MAX_SHORT_HEADER_SIZE];
+	struct skcipher_request *hdr_mask_req;
+	struct quic_tx_ancillary_data control;
+	u8 nonce[QUIC_CIPHER_MAX_NONCE_SIZE];
+	struct	aead_request *aead_req = 0;
+	struct scatterlist *sg_cipher = 0;
+	struct udp_sock *up = udp_sk(sk);
+	struct scatterlist *sg_plain = 0;
+	u16 gso_pkt_size = up->gso_size;
+	size_t last_plain_pkt_size = 0;
+	off_t	payload_crypto_offset;
+	struct crypto_aead *tfm = 0;
+	size_t nr_plain_pages = 0;
+	struct crypto_wait waiter;
+	size_t nr_sg_cipher_pkts;
+	size_t nr_sg_plain_pkts;
+	ssize_t hdr_buf_len = 0;
+	size_t nr_sg_alloc = 0;
+	size_t plain_pkt_size;
+	u64	full_pkt_num;
+	size_t cipher_size;
+	size_t plain_size;
+	size_t pkt_size;
+	size_t tag_size;
+	int ret = 0;
+	int pkt_i;
+	int err;
+
+	memset(&hdr_buf[0], 0, QUIC_MAX_SHORT_HEADER_SIZE);
+	hdr_buf_len = copy_from_iter(hdr_buf, QUIC_MAX_SHORT_HEADER_SIZE,
+				     &msg->msg_iter);
+	if (hdr_buf_len <= 0) {
+		ret = -EINVAL;
+		goto out;
+	}
+	iov_iter_revert(&msg->msg_iter, hdr_buf_len);
+
+	ctx = quic_get_ctx(sk);
+
+	// Bypass for anything that is guaranteed not QUIC.
+	plain_size = len;
+
+	if (plain_size < 2)
+		return ctx->sk_proto->sendmsg(sk, msg, len);
+
+	// Bypass for other than short header.
+	if ((hdr_buf[0] & 0xc0) != 0x40)
+		return ctx->sk_proto->sendmsg(sk, msg, len);
+
+	// Crypto adds a tag after the packet. Corking a payload would produce
+	// a crypto tag after each portion. Use GSO instead.
+	if ((msg->msg_flags & MSG_MORE) || up->pending) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = quic_sendmsg_validate(msg);
+	if (ret)
+		goto out;
+
+	ret = quic_extract_ancillary_data(msg, &control, &gso_pkt_size);
+	if (ret)
+		goto out;
+
+	// Reserved bits with ancillary data present are an error.
+	if (control.flags & ~QUIC_ANCILLARY_FLAGS) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	// Bypass offload on request. First packet bypass applies to all
+	// packets in the GSO pack.
+	if (control.flags & QUIC_BYPASS_ENCRYPTION)
+		return ctx->sk_proto->sendmsg(sk, msg, len);
+
+	if (hdr_buf_len < 1 + control.conn_id_length) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	// Fetch the flow
+	connhash = quic_lookup_connection(ctx, &hdr_buf[1], &control);
+	if (!connhash) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	crypto_ctx = &connhash->crypto_ctx;
+	conn_info = &crypto_ctx->conn_info;
+
+	tag_size = quic_crypto_tag_size(crypto_ctx->conn_info.cipher_type);
+
+	// For GSO, use the GSO size minus cipher tag size as the packet size;
+	// for non-GSO, use the size of the whole plaintext.
+	// Reduce the packet size by tag size to keep the original packet size
+	// for the rest of the UDP path in the stack.
+	if (!gso_pkt_size) {
+		plain_pkt_size = plain_size;
+	} else {
+		if (gso_pkt_size < tag_size)
+			goto out;
+
+		plain_pkt_size = gso_pkt_size - tag_size;
+	}
+
+	// Build scatterlist from the input data, split by GSO minus the
+	// crypto tag size.
+	nr_sg_alloc = quic_sg_capacity_from_msg(plain_pkt_size,
+						msg->msg_iter.iov_offset,
+						plain_size);
+	if ((nr_sg_alloc * 2) >= QUIC_MAX_SG_ALLOC_ELEMENTS) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	sg_plain = ctx->sg_alloc;
+	sg_cipher = sg_plain + nr_sg_alloc;
+
+	ret = quic_sg_plain_from_mapped_msg(msg, plain_pages,
+					    plain_base_ptrs,
+					    plain_data_ptrs, plain_size,
+					    plain_pkt_size, sg_plain,
+					    nr_sg_alloc, sg_plain_pkts,
+					    &nr_plain_pages);
+
+	if (ret < 0)
+		goto out;
+
+	nr_sg_plain_pkts = ret;
+	last_plain_pkt_size = plain_size % plain_pkt_size;
+	if (!last_plain_pkt_size)
+		last_plain_pkt_size = plain_pkt_size;
+
+	// Build scatterlist for the ciphertext, split by GSO.
+	cipher_size = plain_size + nr_sg_plain_pkts * tag_size;
+
+	if (DIV_ROUND_UP(cipher_size, PAGE_SIZE)
+	    >= (1 << QUIC_MAX_CIPHER_PAGES_ORDER)) {
+		ret = -ENOMEM;
+		goto out_put_pages;
+	}
+
+	ret = quic_sg_cipher_from_pkts(tag_size, plain_pkt_size, plain_size,
+				       ctx->cipher_page, sg_cipher, nr_sg_alloc,
+				       sg_cipher_pkts);
+	if (ret < 0)
+		goto out_put_pages;
+
+	nr_sg_cipher_pkts = ret;
+
+	if (nr_sg_plain_pkts != nr_sg_cipher_pkts) {
+		ret = -EPERM;
+		goto out_put_pages;
+	}
+
+	// Encrypt and protect header for each packet individually.
+	tfm = crypto_ctx->packet_aead;
+	crypto_aead_clear_flags(tfm, ~0);
+	aead_req = aead_request_alloc(tfm, GFP_KERNEL);
+	if (!aead_req) {
+		aead_request_free(aead_req);
+		ret = -ENOMEM;
+		goto out_put_pages;
+	}
+
+	hdr_mask_req = skcipher_request_alloc(crypto_ctx->header_tfm,
+					      GFP_KERNEL);
+	if (!hdr_mask_req) {
+		aead_request_free(aead_req);
+		ret = -ENOMEM;
+		goto out_put_pages;
+	}
+
+	for (pkt_i = 0; pkt_i < nr_sg_plain_pkts; ++pkt_i) {
+		payload_crypto_offset =
+			quic_copy_header(sg_plain_pkts[pkt_i],
+					 hdr_buf,
+					 sizeof(hdr_buf),
+					 control.conn_id_length);
+
+		full_pkt_num = quic_unpack_pkt_num(&control, hdr_buf,
+						   payload_crypto_offset);
+
+		pkt_size = (pkt_i + 1 < nr_sg_plain_pkts
+				? plain_pkt_size
+				: last_plain_pkt_size)
+			    - payload_crypto_offset;
+		if (pkt_size < 0) {
+			aead_request_free(aead_req);
+			skcipher_request_free(hdr_mask_req);
+			ret = -EINVAL;
+			goto out_put_pages;
+		}
+
+		/* Construct nonce and initialize request */
+		quic_construct_ietf_nonce(nonce, crypto_ctx, full_pkt_num);
+
+		/* Encrypt the body */
+		aead_request_set_callback(aead_req,
+					  CRYPTO_TFM_REQ_MAY_BACKLOG
+					  | CRYPTO_TFM_REQ_MAY_SLEEP,
+					  crypto_req_done, &waiter);
+		aead_request_set_crypt(aead_req, sg_plain_pkts[pkt_i],
+				       sg_cipher_pkts[pkt_i],
+				       pkt_size,
+				       nonce);
+		aead_request_set_ad(aead_req, payload_crypto_offset);
+		err = crypto_wait_req(crypto_aead_encrypt(aead_req), &waiter);
+		if (unlikely(err)) {
+			ret = err;
+			aead_request_free(aead_req);
+			skcipher_request_free(hdr_mask_req);
+			goto out_put_pages;
+		}
+
+		/* Protect the header */
+		memcpy(sg_virt(sg_cipher_pkts[pkt_i]), hdr_buf,
+		       payload_crypto_offset);
+
+		err = quic_protect_header(crypto_ctx, &control,
+					  hdr_mask_req,
+					  sg_cipher_pkts[pkt_i],
+					  payload_crypto_offset);
+		if (unlikely(err)) {
+			ret = err;
+			aead_request_free(aead_req);
+			skcipher_request_free(hdr_mask_req);
+			goto out_put_pages;
+		}
+	}
+	skcipher_request_free(hdr_mask_req);
+	aead_request_free(aead_req);
+
+	// Deliver to the next layer.
+	if (ctx->sk_proto->sendpage) {
+		msg_cipher.msg_flags |= MSG_MORE;
+		err = ctx->sk_proto->sendmsg(sk, &msg_cipher, 0);
+		if (err < 0) {
+			ret = err;
+			goto out_put_pages;
+		}
+
+		err = ctx->sk_proto->sendpage(sk, ctx->cipher_page, 0,
+					      cipher_size, 0);
+		if (err < 0) {
+			ret = err;
+			goto out_put_pages;
+		}
+		if (err != cipher_size) {
+			ret = -EINVAL;
+			goto out_put_pages;
+		}
+		ret = plain_size;
+	} else {
+		ret = quic_sendpage(ctx, sk, &msg_cipher, cipher_size,
+				    ctx->cipher_page);
+		// indicate full plaintext transmission to the caller.
+		if (ret > 0)
+			ret = plain_size;
+	}
+
+
+out_put_pages:
+	quic_put_plain_user_pages(plain_pages, nr_plain_pages);
+
+out:
+	return ret;
+}
+
+static int quic_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t len)
+{
+	struct quic_context *ctx;
+	int ret;
+
+	rcu_read_lock();
+	ctx = quic_get_ctx(sk);
+	rcu_read_unlock();
+	if (!ctx)
+		return -EINVAL;
+
+	mutex_lock(&ctx->sendmsg_mux);
+	ret = quic_sendmsg(sk, msg, len);
+	mutex_unlock(&ctx->sendmsg_mux);
+	return ret;
+}
+
+static void quic_release_resources(struct sock *sk)
+{
+	struct quic_internal_crypto_context *crypto_ctx;
+	struct quic_connection_rhash *connhash;
+	struct inet_sock *inet = inet_sk(sk);
+	struct rhashtable_iter hti;
+	struct quic_context *ctx;
+	struct proto *sk_proto;
+
+	rcu_read_lock();
+	ctx = quic_get_ctx(sk);
+	if (!ctx) {
+		rcu_read_unlock();
+		return;
+	}
+
+	sk_proto = ctx->sk_proto;
+
+	rhashtable_walk_enter(&ctx->tx_connections, &hti);
+	rhashtable_walk_start(&hti);
+
+	while ((connhash = rhashtable_walk_next(&hti))) {
+		if (IS_ERR(connhash)) {
+			if (PTR_ERR(connhash) == -EAGAIN)
+				continue;
+			break;
+		}
+
+		crypto_ctx = &connhash->crypto_ctx;
+		crypto_free_aead(crypto_ctx->packet_aead);
+		crypto_free_skcipher(crypto_ctx->header_tfm);
+		memzero_explicit(crypto_ctx, sizeof(*crypto_ctx));
+	}
+
+	rhashtable_walk_stop(&hti);
+	rhashtable_walk_exit(&hti);
+	rhashtable_destroy(&ctx->tx_connections);
+
+	if (ctx->cipher_page) {
+		quic_free_cipher_page(ctx->cipher_page);
+		ctx->cipher_page = NULL;
+	}
+
+	rcu_read_unlock();
+
+	write_lock_bh(&sk->sk_callback_lock);
+	rcu_assign_pointer(inet->ulp_data, NULL);
+	WRITE_ONCE(sk->sk_prot, sk_proto);
+	write_unlock_bh(&sk->sk_callback_lock);
+
+	kfree_rcu(ctx, rcu);
+}
+
+static void
+quic_prep_protos(unsigned int af, struct proto *proto, const struct proto *base)
+{
+	if (likely(test_bit(af, &af_init_done)))
+		return;
+
+	spin_lock(&quic_proto_lock);
+	if (test_bit(af, &af_init_done))
+		goto out_unlock;
+
+	*proto			= *base;
+	proto->setsockopt	= quic_setsockopt;
+	proto->getsockopt	= quic_getsockopt;
+	proto->sendmsg		= quic_sendmsg_locked;
+
+	smp_mb__before_atomic(); /* proto calls should be visible first */
+	set_bit(af, &af_init_done);
+
+out_unlock:
+	spin_unlock(&quic_proto_lock);
+}
+
+static void quic_update_proto(struct sock *sk, struct quic_context *ctx)
+{
+	struct proto *udp_proto, *quic_proto;
+	struct inet_sock *inet = inet_sk(sk);
+
+	udp_proto = READ_ONCE(sk->sk_prot);
+	ctx->sk_proto = udp_proto;
+	quic_proto = sk->sk_family == AF_INET ? &quic_v4_proto : &quic_v6_proto;
+
+	quic_prep_protos(sk->sk_family, quic_proto, udp_proto);
+
+	write_lock_bh(&sk->sk_callback_lock);
+	rcu_assign_pointer(inet->ulp_data, ctx);
+	WRITE_ONCE(sk->sk_prot, quic_proto);
+	write_unlock_bh(&sk->sk_callback_lock);
+}
+
+static int quic_init(struct sock *sk)
+{
+	struct quic_context *ctx;
+
+	ctx = quic_ctx_create();
+	if (!ctx)
+		return -ENOMEM;
+
+	quic_update_proto(sk, ctx);
+
+	return 0;
+}
+
+static void quic_release(struct sock *sk)
+{
+	lock_sock(sk);
+	quic_release_resources(sk);
+	release_sock(sk);
+}
+
+static struct udp_ulp_ops quic_ulp_ops __read_mostly = {
+	.name		= "quic-crypto",
+	.owner		= THIS_MODULE,
+	.init		= quic_init,
+	.release	= quic_release,
+};
+
+static int __init quic_register(void)
+{
+	udp_register_ulp(&quic_ulp_ops);
+	return 0;
+}
+
+static void __exit quic_unregister(void)
+{
+	udp_unregister_ulp(&quic_ulp_ops);
+}
+
+module_init(quic_register);
+module_exit(quic_unregister);
+
+MODULE_DESCRIPTION("QUIC crypto ULP");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_UDP_ULP("quic-crypto");
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC net-next 5/6] net: Add flow counters and Tx processing error counter
  2022-08-01 19:52 ` [RFC net-next 0/6] net: support QUIC crypto Adel Abouchaev
                     ` (3 preceding siblings ...)
  2022-08-01 19:52   ` [RFC net-next 4/6] net: Implement QUIC offload functions Adel Abouchaev
@ 2022-08-01 19:52   ` Adel Abouchaev
  2022-08-01 19:52   ` [RFC net-next 6/6] net: Add self tests for ULP operations, flow setup and crypto tests Adel Abouchaev
  2022-08-05  3:37   ` [RFC net-next 0/6] net: support QUIC crypto Bagas Sanjaya
  6 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-01 19:52 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest, Adel Abouchaev

Added flow counters. Total flow counter is accumulative, the current shows the
number of flows currently in flight, the error counters is accumulating the
number of errors during Tx processing.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>
---
 include/net/netns/mib.h   |  3 +++
 include/net/quic.h        | 10 +++++++++
 include/net/snmp.h        |  6 +++++
 include/uapi/linux/snmp.h | 11 ++++++++++
 net/quic/Makefile         |  2 +-
 net/quic/quic_main.c      | 46 +++++++++++++++++++++++++++++++++++++++
 net/quic/quic_proc.c      | 45 ++++++++++++++++++++++++++++++++++++++
 7 files changed, 122 insertions(+), 1 deletion(-)
 create mode 100644 net/quic/quic_proc.c

diff --git a/include/net/netns/mib.h b/include/net/netns/mib.h
index 7e373664b1e7..dcbba3d1ceec 100644
--- a/include/net/netns/mib.h
+++ b/include/net/netns/mib.h
@@ -24,6 +24,9 @@ struct netns_mib {
 #if IS_ENABLED(CONFIG_TLS)
 	DEFINE_SNMP_STAT(struct linux_tls_mib, tls_statistics);
 #endif
+#if IS_ENABLED(CONFIG_QUIC)
+	DEFINE_SNMP_STAT(struct linux_quic_mib, quic_statistics);
+#endif
 #ifdef CONFIG_MPTCP
 	DEFINE_SNMP_STAT(struct mptcp_mib, mptcp_statistics);
 #endif
diff --git a/include/net/quic.h b/include/net/quic.h
index 15e04ea08c53..b6327f3b7632 100644
--- a/include/net/quic.h
+++ b/include/net/quic.h
@@ -25,6 +25,16 @@
 #define QUIC_MAX_PLAIN_PAGES		16
 #define QUIC_MAX_CIPHER_PAGES_ORDER	4
 
+#define __QUIC_INC_STATS(net, field)				\
+	__SNMP_INC_STATS((net)->mib.quic_statistics, field)
+#define QUIC_INC_STATS(net, field)				\
+	SNMP_INC_STATS((net)->mib.quic_statistics, field)
+#define QUIC_DEC_STATS(net, field)				\
+	SNMP_DEC_STATS((net)->mib.quic_statistics, field)
+
+int __net_init quic_proc_init(struct net *net);
+void __net_exit quic_proc_fini(struct net *net);
+
 struct quic_internal_crypto_context {
 	struct quic_connection_info	conn_info;
 	struct crypto_skcipher		*header_tfm;
diff --git a/include/net/snmp.h b/include/net/snmp.h
index 468a67836e2f..f94680a3e9e8 100644
--- a/include/net/snmp.h
+++ b/include/net/snmp.h
@@ -117,6 +117,12 @@ struct linux_tls_mib {
 	unsigned long	mibs[LINUX_MIB_TLSMAX];
 };
 
+/* Linux QUIC */
+#define LINUX_MIB_QUICMAX	__LINUX_MIB_QUICMAX
+struct linux_quic_mib {
+	unsigned long	mibs[LINUX_MIB_QUICMAX];
+};
+
 #define DEFINE_SNMP_STAT(type, name)	\
 	__typeof__(type) __percpu *name
 #define DEFINE_SNMP_STAT_ATOMIC(type, name)	\
diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index 904909d020e2..708f62e28c9d 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -347,4 +347,15 @@ enum
 	__LINUX_MIB_TLSMAX
 };
 
+/* linux QUIC mib definitions */
+enum
+{
+	LINUX_MIB_QUICNUM = 0,
+	LINUX_MIB_QUICCURRTXSW,			/* QuicCurrTxSw */
+	LINUX_MIB_QUICTXSW,			/* QuicTxSw */
+	LINUX_MIB_QUICTXSWERROR,		/* QuicTxSwError */
+	__LINUX_MIB_QUICMAX
+};
+
+
 #endif	/* _LINUX_SNMP_H */
diff --git a/net/quic/Makefile b/net/quic/Makefile
index 928239c4d08c..a885cd8bc4e0 100644
--- a/net/quic/Makefile
+++ b/net/quic/Makefile
@@ -5,4 +5,4 @@
 
 obj-$(CONFIG_QUIC) += quic.o
 
-quic-y := quic_main.o
+quic-y := quic_main.o quic_proc.o
diff --git a/net/quic/quic_main.c b/net/quic/quic_main.c
index e738c8130a4f..eb0fdeabd3c4 100644
--- a/net/quic/quic_main.c
+++ b/net/quic/quic_main.c
@@ -362,6 +362,8 @@ static int do_quic_conn_add_tx(struct sock *sk, sockptr_t optval,
 	if (rc < 0)
 		goto err_free_ciphers;
 
+	QUIC_INC_STATS(sock_net(sk), LINUX_MIB_QUICCURRTXSW);
+	QUIC_INC_STATS(sock_net(sk), LINUX_MIB_QUICTXSW);
 	return 0;
 
 err_free_ciphers:
@@ -411,6 +413,7 @@ static int do_quic_conn_del_tx(struct sock *sk, sockptr_t optval,
 	crypto_free_aead(crypto_ctx->packet_aead);
 	memzero_explicit(crypto_ctx, sizeof(*crypto_ctx));
 	kfree(connhash);
+	QUIC_DEC_STATS(sock_net(sk), LINUX_MIB_QUICCURRTXSW);
 
 	return 0;
 }
@@ -436,6 +439,9 @@ static int do_quic_setsockopt(struct sock *sk, int optname, sockptr_t optval,
 		break;
 	}
 
+	if (rc)
+		QUIC_INC_STATS(sock_net(sk), LINUX_MIB_QUICTXSWERROR);
+
 	return rc;
 }
 
@@ -1242,6 +1248,9 @@ static int quic_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	quic_put_plain_user_pages(plain_pages, nr_plain_pages);
 
 out:
+	if (unlikely(ret < 0))
+		QUIC_INC_STATS(sock_net(sk), LINUX_MIB_QUICTXSWERROR);
+
 	return ret;
 }
 
@@ -1374,6 +1383,36 @@ static void quic_release(struct sock *sk)
 	release_sock(sk);
 }
 
+static int __net_init quic_init_net(struct net *net)
+{
+	int err;
+
+	net->mib.quic_statistics = alloc_percpu(struct linux_quic_mib);
+	if (!net->mib.quic_statistics)
+		return -ENOMEM;
+
+	err = quic_proc_init(net);
+	if (err)
+		goto err_free_stats;
+
+	return 0;
+
+err_free_stats:
+	free_percpu(net->mib.quic_statistics);
+	return err;
+}
+
+static void __net_exit quic_exit_net(struct net *net)
+{
+	quic_proc_fini(net);
+	free_percpu(net->mib.quic_statistics);
+}
+
+static struct pernet_operations quic_proc_ops = {
+	.init = quic_init_net,
+	.exit = quic_exit_net,
+};
+
 static struct udp_ulp_ops quic_ulp_ops __read_mostly = {
 	.name		= "quic-crypto",
 	.owner		= THIS_MODULE,
@@ -1383,6 +1422,12 @@ static struct udp_ulp_ops quic_ulp_ops __read_mostly = {
 
 static int __init quic_register(void)
 {
+	int err;
+
+	err = register_pernet_subsys(&quic_proc_ops);
+	if (err)
+		return err;
+
 	udp_register_ulp(&quic_ulp_ops);
 	return 0;
 }
@@ -1390,6 +1435,7 @@ static int __init quic_register(void)
 static void __exit quic_unregister(void)
 {
 	udp_unregister_ulp(&quic_ulp_ops);
+	unregister_pernet_subsys(&quic_proc_ops);
 }
 
 module_init(quic_register);
diff --git a/net/quic/quic_proc.c b/net/quic/quic_proc.c
new file mode 100644
index 000000000000..cb4fe7a589b5
--- /dev/null
+++ b/net/quic/quic_proc.c
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+/* Copyright (C) 2019 Meta Platforms, Inc. */
+
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <net/snmp.h>
+#include <net/quic.h>
+
+#ifdef CONFIG_PROC_FS
+static const struct snmp_mib quic_mib_list[] = {
+	SNMP_MIB_ITEM("QuicCurrTxSw", LINUX_MIB_QUICCURRTXSW),
+	SNMP_MIB_ITEM("QuicTxSw", LINUX_MIB_QUICTXSW),
+	SNMP_MIB_ITEM("QuicTxSwError", LINUX_MIB_QUICTXSWERROR),
+	SNMP_MIB_SENTINEL
+};
+
+static int quic_statistics_seq_show(struct seq_file *seq, void *v)
+{
+	unsigned long buf[LINUX_MIB_QUICMAX] = {};
+	struct net *net = seq->private;
+	int i;
+
+	snmp_get_cpu_field_batch(buf, quic_mib_list, net->mib.quic_statistics);
+	for (i = 0; quic_mib_list[i].name; i++)
+		seq_printf(seq, "%-32s\t%lu\n", quic_mib_list[i].name, buf[i]);
+
+	return 0;
+}
+#endif
+
+int __net_init quic_proc_init(struct net *net)
+{
+#ifdef CONFIG_PROC_FS
+	if (!proc_create_net_single("quic_stat", 0444, net->proc_net,
+				    quic_statistics_seq_show, NULL))
+		return -ENOMEM;
+#endif /* CONFIG_PROC_FS */
+
+	return 0;
+}
+
+void __net_exit quic_proc_fini(struct net *net)
+{
+	remove_proc_entry("quic_stat", net->proc_net);
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC net-next 6/6] net: Add self tests for ULP operations, flow setup and crypto tests
  2022-08-01 19:52 ` [RFC net-next 0/6] net: support QUIC crypto Adel Abouchaev
                     ` (4 preceding siblings ...)
  2022-08-01 19:52   ` [RFC net-next 5/6] net: Add flow counters and Tx processing error counter Adel Abouchaev
@ 2022-08-01 19:52   ` Adel Abouchaev
  2022-08-05  3:37   ` [RFC net-next 0/6] net: support QUIC crypto Bagas Sanjaya
  6 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-01 19:52 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest, Adel Abouchaev

Add self tests for ULP operations, flow setup and crypto tests.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>
---
 tools/testing/selftests/net/.gitignore |    1 +
 tools/testing/selftests/net/Makefile   |    2 +-
 tools/testing/selftests/net/quic.c     | 1024 ++++++++++++++++++++++++
 tools/testing/selftests/net/quic.sh    |   45 ++
 4 files changed, 1071 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/net/quic.c
 create mode 100755 tools/testing/selftests/net/quic.sh

diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore
index ffc35a22e914..bd4967e57803 100644
--- a/tools/testing/selftests/net/.gitignore
+++ b/tools/testing/selftests/net/.gitignore
@@ -38,3 +38,4 @@ ioam6_parser
 toeplitz
 tun
 cmsg_sender
+quic
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index db05b3764b77..aee89b0458b4 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -54,7 +54,7 @@ TEST_GEN_FILES += ipsec
 TEST_GEN_FILES += ioam6_parser
 TEST_GEN_FILES += gro
 TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
-TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls tun
+TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls tun quic
 TEST_GEN_FILES += toeplitz
 TEST_GEN_FILES += cmsg_sender
 TEST_GEN_FILES += stress_reuseport_listen
diff --git a/tools/testing/selftests/net/quic.c b/tools/testing/selftests/net/quic.c
new file mode 100644
index 000000000000..20e425003fcb
--- /dev/null
+++ b/tools/testing/selftests/net/quic.c
@@ -0,0 +1,1024 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include <arpa/inet.h>
+#include <errno.h>
+#include <error.h>
+#include <fcntl.h>
+#include <poll.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <linux/limits.h>
+#include <linux/quic.h>
+#include <linux/socket.h>
+#include <linux/tls.h>
+#include <linux/tcp.h>
+#include <linux/types.h>
+#include <linux/udp.h>
+
+#include <sys/ioctl.h>
+#include <sys/types.h>
+#include <sys/sendfile.h>
+#include <sys/socket.h>
+#include <sys/stat.h>
+
+#include "../kselftest_harness.h"
+
+#define UDP_ULP		105
+
+#ifndef SOL_UDP
+#define SOL_UDP		17
+#endif
+
+// 1. QUIC ULP Registration Test
+
+FIXTURE(quic_ulp)
+{
+	int sfd;
+	socklen_t len_s;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} server;
+	int default_net_ns_fd;
+	int server_net_ns_fd;
+};
+
+FIXTURE_VARIANT(quic_ulp)
+{
+	unsigned int af_server;
+	char *server_address;
+	unsigned short server_port;
+};
+
+FIXTURE_VARIANT_ADD(quic_ulp, ipv4)
+{
+	.af_server = AF_INET,
+	.server_address = "10.0.0.2",
+	.server_port = 7101,
+};
+
+FIXTURE_VARIANT_ADD(quic_ulp, ipv6)
+{
+	.af_server = AF_INET6,
+	.server_address = "2001::2",
+	.server_port = 7102,
+};
+
+FIXTURE_SETUP(quic_ulp)
+{
+	char path[PATH_MAX];
+	int optval = 1;
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/net", getpid());
+	self->default_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->default_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns2");
+	self->server_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->server_net_ns_fd, 0);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	self->sfd = socket(variant->af_server, SOCK_DGRAM, 0);
+	ASSERT_NE(setsockopt(self->sfd, SOL_SOCKET, SO_REUSEPORT, &optval,
+		   sizeof(optval)), -1);
+	if (variant->af_server == AF_INET) {
+		self->len_s = sizeof(self->server.addr);
+		self->server.addr.sin_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr.sin_addr);
+		self->server.addr.sin_port = htons(variant->server_port);
+		ASSERT_EQ(bind(self->sfd, &self->server.addr, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr,
+				      &self->len_s), 0);
+	} else {
+		self->len_s = sizeof(self->server.addr6);
+		self->server.addr6.sin6_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr6.sin6_addr);
+		self->server.addr6.sin6_port = htons(variant->server_port);
+		ASSERT_EQ(bind(self->sfd, &self->server.addr6, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr6,
+				      &self->len_s), 0);
+	}
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+};
+
+FIXTURE_TEARDOWN(quic_ulp)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	close(self->sfd);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+};
+
+TEST_F(quic_ulp, request_nonexistent_udp_ulp)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_ULP,
+			     "nonexistent", sizeof("nonexistent")), -1);
+	// If UDP_ULP option is not present, the error would be ENOPROTOOPT.
+	ASSERT_EQ(errno, ENOENT);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+};
+
+TEST_F(quic_ulp, request_quic_crypto_udp_ulp)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_ULP,
+			     "quic-crypto", sizeof("quic-crypto")), 0);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+};
+
+// 2. QUIC Data Path Operation Tests
+
+#define DO_NOT_SETUP_FLOW 0
+#define SETUP_FLOW 1
+
+#define DO_NOT_USE_CLIENT 0
+#define USE_CLIENT 1
+
+FIXTURE(quic_data)
+{
+	int sfd, c1fd, c2fd;
+	socklen_t len_c1;
+	socklen_t len_c2;
+	socklen_t len_s;
+
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} client_1;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} client_2;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} server;
+	int default_net_ns_fd;
+	int client_1_net_ns_fd;
+	int client_2_net_ns_fd;
+	int server_net_ns_fd;
+};
+
+FIXTURE_VARIANT(quic_data)
+{
+	unsigned int af_client_1;
+	char *client_1_address;
+	unsigned short client_1_port;
+	uint8_t conn_id_1[8];
+	uint8_t conn_1_key[16];
+	uint8_t conn_1_iv[12];
+	uint8_t conn_1_hdr_key[16];
+	size_t conn_id_1_len;
+	bool setup_flow_1;
+	bool use_client_1;
+	unsigned int af_client_2;
+	char *client_2_address;
+	unsigned short client_2_port;
+	uint8_t conn_id_2[8];
+	uint8_t conn_2_key[16];
+	uint8_t conn_2_iv[12];
+	uint8_t conn_2_hdr_key[16];
+	size_t conn_id_2_len;
+	bool setup_flow_2;
+	bool use_client_2;
+	unsigned int af_server;
+	char *server_address;
+	unsigned short server_port;
+};
+
+FIXTURE_VARIANT_ADD(quic_data, ipv4)
+{
+	.af_client_1 = AF_INET,
+	.client_1_address = "10.0.0.1",
+	.client_1_port = 6667,
+	.conn_id_1 = {0x11, 0x12, 0x13, 0x14},
+	.conn_id_1_len = 4,
+	.setup_flow_1 = SETUP_FLOW,
+	.use_client_1 = USE_CLIENT,
+	.af_client_2 = AF_INET,
+	.client_2_address = "10.0.0.3",
+	.client_2_port = 6668,
+	.conn_id_2 = {0x21, 0x22, 0x23, 0x24},
+	.conn_id_2_len = 4,
+	.setup_flow_2 = SETUP_FLOW,
+	//.use_client_2 = USE_CLIENT,
+	.af_server = AF_INET,
+	.server_address = "10.0.0.2",
+	.server_port = 6669,
+};
+
+FIXTURE_VARIANT_ADD(quic_data, ipv6_mapped_ipv4_two_conns)
+{
+	.af_client_1 = AF_INET6,
+	.client_1_address = "::ffff:10.0.0.1",
+	.client_1_port = 6670,
+	.conn_id_1 = {0x11, 0x12, 0x13, 0x14},
+	.conn_id_1_len = 4,
+	.setup_flow_1 = SETUP_FLOW,
+	.use_client_1 = USE_CLIENT,
+	.af_client_2 = AF_INET6,
+	.client_2_address = "::ffff:10.0.0.3",
+	.client_2_port = 6671,
+	.conn_id_2 = {0x21, 0x22, 0x23, 0x24},
+	.conn_id_2_len = 4,
+	.setup_flow_2 = SETUP_FLOW,
+	.use_client_2 = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "::ffff:10.0.0.2",
+	.server_port = 6672,
+};
+
+FIXTURE_VARIANT_ADD(quic_data, ipv6_mapped_ipv4_setup_ipv4_one_conn)
+{
+	.af_client_1 = AF_INET,
+	.client_1_address = "10.0.0.3",
+	.client_1_port = 6676,
+	.conn_id_1 = {0x11, 0x12, 0x13, 0x14},
+	.conn_id_1_len = 4,
+	.setup_flow_1 = SETUP_FLOW,
+	.use_client_1 = DO_NOT_USE_CLIENT,
+	.af_client_2 = AF_INET6,
+	.client_2_address = "::ffff:10.0.0.3",
+	.client_2_port = 6676,
+	.conn_id_2 = {0x11, 0x12, 0x13, 0x14},
+	.conn_id_2_len = 4,
+	.setup_flow_2 = DO_NOT_SETUP_FLOW,
+	.use_client_2 = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "::ffff:10.0.0.2",
+	.server_port = 6677,
+};
+
+FIXTURE_VARIANT_ADD(quic_data, ipv6_mapped_ipv4_setup_ipv6_one_conn)
+{
+	.af_client_1 = AF_INET6,
+	.client_1_address = "::ffff:10.0.0.3",
+	.client_1_port = 6678,
+	.conn_id_1 = {0x11, 0x12, 0x13, 0x14},
+	.setup_flow_1 = SETUP_FLOW,
+	.use_client_1 = DO_NOT_USE_CLIENT,
+	.af_client_2 = AF_INET,
+	.client_2_address = "10.0.0.3",
+	.client_2_port = 6678,
+	.conn_id_2 = {0x11, 0x12, 0x13, 0x14},
+	.setup_flow_2 = DO_NOT_SETUP_FLOW,
+	.use_client_2 = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "::ffff:10.0.0.2",
+	.server_port = 6679,
+};
+
+FIXTURE_SETUP(quic_data)
+{
+	char path[PATH_MAX];
+	int optval = 1;
+
+	if (variant->af_client_1 == AF_INET) {
+		self->len_c1 = sizeof(self->client_1.addr);
+		self->client_1.addr.sin_family = variant->af_client_1;
+		inet_pton(variant->af_client_1, variant->client_1_address,
+			  &self->client_1.addr.sin_addr);
+		self->client_1.addr.sin_port = htons(variant->client_1_port);
+	} else {
+		self->len_c1 = sizeof(self->client_1.addr6);
+		self->client_1.addr6.sin6_family = variant->af_client_1;
+		inet_pton(variant->af_client_1, variant->client_1_address,
+			  &self->client_1.addr6.sin6_addr);
+		self->client_1.addr6.sin6_port = htons(variant->client_1_port);
+	}
+
+	if (variant->af_client_2 == AF_INET) {
+		self->len_c2 = sizeof(self->client_2.addr);
+		self->client_2.addr.sin_family = variant->af_client_2;
+		inet_pton(variant->af_client_2, variant->client_2_address,
+			  &self->client_2.addr.sin_addr);
+		self->client_2.addr.sin_port = htons(variant->client_2_port);
+	} else {
+		self->len_c2 = sizeof(self->client_2.addr6);
+		self->client_2.addr6.sin6_family = variant->af_client_2;
+		inet_pton(variant->af_client_2, variant->client_2_address,
+			  &self->client_2.addr6.sin6_addr);
+		self->client_2.addr6.sin6_port = htons(variant->client_2_port);
+	}
+
+	if (variant->af_server == AF_INET) {
+		self->len_s = sizeof(self->server.addr);
+		self->server.addr.sin_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr.sin_addr);
+		self->server.addr.sin_port = htons(variant->server_port);
+	} else {
+		self->len_s = sizeof(self->server.addr6);
+		self->server.addr6.sin6_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr6.sin6_addr);
+		self->server.addr6.sin6_port = htons(variant->server_port);
+	}
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/net", getpid());
+	self->default_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->default_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns11");
+	self->client_1_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->client_1_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns12");
+	self->client_2_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->client_2_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns2");
+	self->server_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->server_net_ns_fd, 0);
+
+	if (variant->use_client_1) {
+		ASSERT_NE(setns(self->client_1_net_ns_fd, 0), -1);
+		self->c1fd = socket(variant->af_client_1, SOCK_DGRAM, 0);
+		ASSERT_NE(setsockopt(self->c1fd, SOL_SOCKET, SO_REUSEPORT, &optval,
+		   sizeof(optval)), -1);
+		if (variant->af_client_1 == AF_INET) {
+			ASSERT_EQ(bind(self->c1fd, &self->client_1.addr,
+				       self->len_c1), 0);
+			ASSERT_EQ(getsockname(self->c1fd, &self->client_1.addr,
+					      &self->len_c1), 0);
+		} else {
+			ASSERT_EQ(bind(self->c1fd, &self->client_1.addr6,
+				       self->len_c1), 0);
+			ASSERT_EQ(getsockname(self->c1fd, &self->client_1.addr6,
+					      &self->len_c1), 0);
+		}
+	}
+
+	if (variant->use_client_2) {
+		ASSERT_NE(setns(self->client_2_net_ns_fd, 0), -1);
+		self->c2fd = socket(variant->af_client_2, SOCK_DGRAM, 0);
+		ASSERT_NE(setsockopt(self->c2fd, SOL_SOCKET, SO_REUSEPORT, &optval,
+		   sizeof(optval)), -1);
+		if (variant->af_client_2 == AF_INET) {
+			ASSERT_EQ(bind(self->c2fd, &self->client_2.addr,
+				       self->len_c2), 0);
+			ASSERT_EQ(getsockname(self->c2fd, &self->client_2.addr,
+					      &self->len_c2), 0);
+		} else {
+			ASSERT_EQ(bind(self->c2fd, &self->client_2.addr6,
+				       self->len_c2), 0);
+			ASSERT_EQ(getsockname(self->c2fd, &self->client_2.addr6,
+					      &self->len_c2), 0);
+		}
+	}
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	self->sfd = socket(variant->af_server, SOCK_DGRAM, 0);
+	ASSERT_NE(setsockopt(self->sfd, SOL_SOCKET, SO_REUSEPORT, &optval,
+	   sizeof(optval)), -1);
+	if (variant->af_server == AF_INET) {
+		ASSERT_EQ(bind(self->sfd, &self->server.addr, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr,
+				      &self->len_s), 0);
+	} else {
+		ASSERT_EQ(bind(self->sfd, &self->server.addr6, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr6,
+				      &self->len_s), 0);
+	}
+
+	ASSERT_EQ(setsockopt(self->sfd, IPPROTO_UDP, UDP_ULP,
+			     "quic-crypto", sizeof("quic-crypto")), 0);
+
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+FIXTURE_TEARDOWN(quic_data)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	close(self->sfd);
+	ASSERT_NE(setns(self->client_1_net_ns_fd, 0), -1);
+	close(self->c1fd);
+	ASSERT_NE(setns(self->client_2_net_ns_fd, 0), -1);
+	close(self->c2fd);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+TEST_F(quic_data, send_fail_no_flow)
+{
+	char const *test_str = "test_read";
+	int send_len = 10;
+
+	ASSERT_EQ(strlen(test_str) + 1, send_len);
+	EXPECT_EQ(sendto(self->sfd, test_str, send_len, 0,
+			 &self->client_1.addr, self->len_c1), -1);
+};
+
+TEST_F(quic_data, encrypt_two_conn_gso_1200_iov_2_size_9000_aesgcm128)
+{
+	size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
+	uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)];
+	struct quic_connection_info conn_1_info;
+	struct quic_connection_info conn_2_info;
+	struct quic_tx_ancillary_data *anc_data;
+	socklen_t recv_addr_len_1;
+	socklen_t recv_addr_len_2;
+	struct cmsghdr *cmsg_hdr;
+	int frag_size = 1200;
+	int send_len = 9000;
+	struct iovec iov[2];
+	int msg_len = 4500;
+	struct msghdr msg;
+	char *test_str_1;
+	char *test_str_2;
+	char *buf_1;
+	char *buf_2;
+	int i;
+
+	test_str_1 = (char *)malloc(9000);
+	test_str_2 = (char *)malloc(9000);
+	memset(test_str_1, 0, 9000);
+	memset(test_str_2, 0, 9000);
+
+	buf_1 = (char *)malloc(10000);
+	buf_2 = (char *)malloc(10000);
+	for (i = 0; i < 9000; i += (1200 - 16)) {
+		test_str_1[i] = 0x40;
+		memcpy(&test_str_1[i + 1], &variant->conn_id_1,
+		       variant->conn_id_1_len);
+		test_str_1[i + 1 + variant->conn_id_1_len] = 0xca;
+
+		test_str_2[i] = 0x40;
+		memcpy(&test_str_2[i + 1], &variant->conn_id_2,
+		       variant->conn_id_2_len);
+		test_str_2[i + 1 + variant->conn_id_2_len] = 0xca;
+	}
+
+	// program the connection into the offload
+	conn_1_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	memset(&conn_1_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_1_info.key.conn_id_length = variant->conn_id_1_len;
+	memcpy(conn_1_info.key.conn_id,
+	       &variant->conn_id_1,
+	       variant->conn_id_1_len);
+
+	conn_2_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	memset(&conn_2_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_2_info.key.conn_id_length = variant->conn_id_2_len;
+	memcpy(conn_2_info.key.conn_id,
+	       &variant->conn_id_2,
+	       variant->conn_id_2_len);
+
+	memcpy(&conn_1_info.crypto_info_aes_gcm_128.packet_encryption_key,
+	       &variant->conn_1_key, 16);
+	memcpy(&conn_1_info.crypto_info_aes_gcm_128.packet_encryption_iv,
+	       &variant->conn_1_iv, 12);
+	memcpy(&conn_1_info.crypto_info_aes_gcm_128.header_encryption_key,
+	       &variant->conn_1_hdr_key, 16);
+	memcpy(&conn_2_info.crypto_info_aes_gcm_128.packet_encryption_key,
+	       &variant->conn_2_key, 16);
+	memcpy(&conn_2_info.crypto_info_aes_gcm_128.packet_encryption_iv,
+	       &variant->conn_2_iv, 12);
+	memcpy(&conn_2_info.crypto_info_aes_gcm_128.header_encryption_key,
+	       &variant->conn_2_hdr_key,
+	       16);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
+			     sizeof(frag_size)), 0);
+
+	if (variant->setup_flow_1)
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_ADD_TX_CONNECTION,
+				     &conn_1_info, sizeof(conn_1_info)), 0);
+
+	if (variant->setup_flow_2)
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_ADD_TX_CONNECTION,
+				     &conn_2_info, sizeof(conn_2_info)), 0);
+
+	recv_addr_len_1 = self->len_c1;
+	recv_addr_len_2 = self->len_c2;
+
+	iov[0].iov_base = test_str_1;
+	iov[0].iov_len = msg_len;
+	iov[1].iov_base = (void *)test_str_1 + 4500;
+	iov[1].iov_len = msg_len;
+
+	msg.msg_name = (self->client_1.addr.sin_family == AF_INET)
+		       ? (void *)&self->client_1.addr
+		       : (void *)&self->client_1.addr6;
+	msg.msg_namelen = self->len_c1;
+	msg.msg_iov = iov;
+	msg.msg_iovlen = 2;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->flags = 0;
+	anc_data->conn_id_length = variant->conn_id_1_len;
+
+	if (variant->use_client_1)
+		EXPECT_EQ(sendmsg(self->sfd, &msg, 0), send_len);
+
+	iov[0].iov_base = test_str_2;
+	iov[0].iov_len = msg_len;
+	iov[1].iov_base = (void *)test_str_2 + 4500;
+	iov[1].iov_len = msg_len;
+	msg.msg_name = (self->client_2.addr.sin_family == AF_INET)
+		       ? (void *)&self->client_2.addr
+		       : (void *)&self->client_2.addr6;
+	msg.msg_namelen = self->len_c2;
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->conn_id_length = variant->conn_id_2_len;
+	anc_data->flags = 0;
+
+	if (variant->use_client_2)
+		EXPECT_EQ(sendmsg(self->sfd, &msg, 0), send_len);
+
+	if (variant->use_client_1) {
+		ASSERT_NE(setns(self->client_1_net_ns_fd, 0), -1);
+		if (variant->af_client_1 == AF_INET) {
+			for (i = 0; i < 7; ++i) {
+				EXPECT_EQ(recvfrom(self->c1fd, buf_1, 9000, 0,
+						   &self->client_1.addr,
+						   &recv_addr_len_1),
+					  1200);
+				// Validate framing is intact.
+				EXPECT_EQ(memcmp((void *)buf_1 + 1,
+						 &variant->conn_id_1,
+						 variant->conn_id_1_len), 0);
+			}
+			EXPECT_EQ(recvfrom(self->c1fd, buf_1, 9000, 0,
+					   &self->client_1.addr,
+					   &recv_addr_len_1),
+				  728);
+			EXPECT_EQ(memcmp((void *)buf_1 + 1,
+					 &variant->conn_id_1,
+					 variant->conn_id_1_len), 0);
+		} else {
+			for (i = 0; i < 7; ++i) {
+				EXPECT_EQ(recvfrom(self->c1fd, buf_1, 9000, 0,
+						   &self->client_1.addr6,
+						   &recv_addr_len_1),
+					1200);
+			}
+			EXPECT_EQ(recvfrom(self->c1fd, buf_1, 9000, 0,
+					   &self->client_1.addr6,
+					   &recv_addr_len_1),
+				  728);
+			EXPECT_EQ(memcmp((void *)buf_1 + 1,
+					 &variant->conn_id_1,
+					 variant->conn_id_1_len), 0);
+		}
+		EXPECT_NE(memcmp(buf_1, test_str_1, send_len), 0);
+	}
+
+	if (variant->use_client_2) {
+		ASSERT_NE(setns(self->client_2_net_ns_fd, 0), -1);
+		if (variant->af_client_2 == AF_INET) {
+			for (i = 0; i < 7; ++i) {
+				EXPECT_EQ(recvfrom(self->c2fd, buf_2, 9000, 0,
+						   &self->client_2.addr,
+						   &recv_addr_len_2),
+					  1200);
+				EXPECT_EQ(memcmp((void *)buf_2 + 1,
+						 &variant->conn_id_2,
+						 variant->conn_id_2_len), 0);
+			}
+			EXPECT_EQ(recvfrom(self->c2fd, buf_2, 9000, 0,
+					   &self->client_2.addr,
+					   &recv_addr_len_2),
+				  728);
+			EXPECT_EQ(memcmp((void *)buf_2 + 1,
+					 &variant->conn_id_2,
+					 variant->conn_id_2_len), 0);
+		} else {
+			for (i = 0; i < 7; ++i) {
+				EXPECT_EQ(recvfrom(self->c2fd, buf_2, 9000, 0,
+						   &self->client_2.addr6,
+						   &recv_addr_len_2),
+					  1200);
+				EXPECT_EQ(memcmp((void *)buf_2 + 1,
+						 &variant->conn_id_2,
+						 variant->conn_id_2_len), 0);
+			}
+			EXPECT_EQ(recvfrom(self->c2fd, buf_2, 9000, 0,
+					   &self->client_2.addr6,
+					   &recv_addr_len_2),
+				  728);
+			EXPECT_EQ(memcmp((void *)buf_2 + 1,
+					 &variant->conn_id_2,
+					 variant->conn_id_2_len), 0);
+		}
+		EXPECT_NE(memcmp(buf_2, test_str_2, send_len), 0);
+	}
+
+	if (variant->use_client_1 && variant->use_client_2)
+		EXPECT_NE(memcmp(buf_1, buf_2, send_len), 0);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	if (variant->setup_flow_1) {
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_DEL_TX_CONNECTION,
+				     &conn_1_info, sizeof(conn_1_info)),
+			  0);
+	}
+	if (variant->setup_flow_2) {
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_DEL_TX_CONNECTION,
+				     &conn_2_info, sizeof(conn_2_info)),
+			  0);
+	}
+	free(test_str_1);
+	free(test_str_2);
+	free(buf_1);
+	free(buf_2);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+// 3. QUIC Encryption Tests
+
+FIXTURE(quic_crypto)
+{
+	int sfd, cfd;
+	socklen_t len_c;
+	socklen_t len_s;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} client;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} server;
+	int default_net_ns_fd;
+	int client_net_ns_fd;
+	int server_net_ns_fd;
+};
+
+FIXTURE_VARIANT(quic_crypto)
+{
+	unsigned int af_client;
+	char *client_address;
+	unsigned short client_port;
+	uint32_t algo;
+	uint8_t conn_id[8];
+	uint8_t conn_key[16];
+	uint8_t conn_iv[12];
+	uint8_t conn_hdr_key[16];
+	size_t conn_id_len;
+	bool setup_flow;
+	bool use_client;
+	unsigned int af_server;
+	char *server_address;
+	unsigned short server_port;
+};
+
+FIXTURE_SETUP(quic_crypto)
+{
+	char path[PATH_MAX];
+	int optval = 1;
+
+	if (variant->af_client == AF_INET) {
+		self->len_c = sizeof(self->client.addr);
+		self->client.addr.sin_family = variant->af_client;
+		inet_pton(variant->af_client, variant->client_address,
+			  &self->client.addr.sin_addr);
+		self->client.addr.sin_port = htons(variant->client_port);
+	} else {
+		self->len_c = sizeof(self->client.addr6);
+		self->client.addr6.sin6_family = variant->af_client;
+		inet_pton(variant->af_client, variant->client_address,
+			  &self->client.addr6.sin6_addr);
+		self->client.addr6.sin6_port = htons(variant->client_port);
+	}
+
+	if (variant->af_server == AF_INET) {
+		self->len_s = sizeof(self->server.addr);
+		self->server.addr.sin_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr.sin_addr);
+		self->server.addr.sin_port = htons(variant->server_port);
+	} else {
+		self->len_s = sizeof(self->server.addr6);
+		self->server.addr6.sin6_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr6.sin6_addr);
+		self->server.addr6.sin6_port = htons(variant->server_port);
+	}
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/net", getpid());
+	self->default_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->default_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns11");
+	self->client_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->client_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns2");
+	self->server_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->server_net_ns_fd, 0);
+
+	if (variant->use_client) {
+		ASSERT_NE(setns(self->client_net_ns_fd, 0), -1);
+		self->cfd = socket(variant->af_client, SOCK_DGRAM, 0);
+		ASSERT_NE(setsockopt(self->cfd, SOL_SOCKET, SO_REUSEPORT, &optval,
+			sizeof(optval)), -1);
+		if (variant->af_client == AF_INET) {
+			ASSERT_EQ(bind(self->cfd, &self->client.addr,
+				       self->len_c), 0);
+			ASSERT_EQ(getsockname(self->cfd, &self->client.addr,
+					      &self->len_c), 0);
+		} else {
+			ASSERT_EQ(bind(self->cfd, &self->client.addr6,
+				       self->len_c), 0);
+			ASSERT_EQ(getsockname(self->cfd, &self->client.addr6,
+					      &self->len_c), 0);
+		}
+	}
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	self->sfd = socket(variant->af_server, SOCK_DGRAM, 0);
+	ASSERT_NE(setsockopt(self->sfd, SOL_SOCKET, SO_REUSEPORT, &optval,
+	   sizeof(optval)), -1);
+	if (variant->af_server == AF_INET) {
+		ASSERT_EQ(bind(self->sfd, &self->server.addr, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr,
+				      &self->len_s),
+			  0);
+	} else {
+		ASSERT_EQ(bind(self->sfd, &self->server.addr6, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr6,
+				      &self->len_s),
+			  0);
+	}
+
+	ASSERT_EQ(setsockopt(self->sfd, IPPROTO_UDP, UDP_ULP,
+			     "quic-crypto", sizeof("quic-crypto")), 0);
+
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+FIXTURE_TEARDOWN(quic_crypto)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	close(self->sfd);
+	ASSERT_NE(setns(self->client_net_ns_fd, 0), -1);
+	close(self->cfd);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+FIXTURE_VARIANT_ADD(quic_crypto, ipv4)
+{
+	.af_client = AF_INET,
+	.client_address = "10.0.0.1",
+	.client_port = 7667,
+	.conn_id = {0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12, 0x49},
+	.conn_key = {0x87, 0x71, 0xEA, 0x1D, 0xFB, 0xBE, 0x7A, 0x45, 0xBB,
+		0xE2, 0x7E, 0xBC, 0x0B, 0x53, 0x94, 0x99},
+	.conn_iv = {0x3A, 0xA7, 0x46, 0x72, 0xE9, 0x83, 0x6B, 0x55, 0xDA,
+		0x66, 0x7B, 0xDA},
+	.conn_hdr_key = {0xC9, 0x8E, 0xFD, 0xF2, 0x0B, 0x64, 0x8C, 0x57,
+		0xB5, 0x0A, 0xB2, 0xD2, 0x21, 0xD3, 0x66, 0xA5},
+	.conn_id_len = 8,
+	.setup_flow = SETUP_FLOW,
+	.use_client = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "10.0.0.2",
+	.server_port = 7669,
+};
+
+FIXTURE_VARIANT_ADD(quic_crypto, ipv6)
+{
+	.af_client = AF_INET6,
+	.client_address = "2001::1",
+	.client_port = 7673,
+	.conn_id = {0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12, 0x49},
+	.conn_key = {0x87, 0x71, 0xEA, 0x1D, 0xFB, 0xBE, 0x7A, 0x45, 0xBB,
+		0xE2, 0x7E, 0xBC, 0x0B, 0x53, 0x94, 0x99},
+	.conn_iv = {0x3A, 0xA7, 0x46, 0x72, 0xE9, 0x83, 0x6B, 0x55, 0xDA,
+		0x66, 0x7B, 0xDA},
+	.conn_hdr_key = {0xC9, 0x8E, 0xFD, 0xF2, 0x0B, 0x64, 0x8C, 0x57,
+		0xB5, 0x0A, 0xB2, 0xD2, 0x21, 0xD3, 0x66, 0xA5},
+	.conn_id_len = 8,
+	.setup_flow = SETUP_FLOW,
+	.use_client = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "2001::2",
+	.server_port = 7675,
+};
+
+TEST_F(quic_crypto, encrypt_test_vector_aesgcm128_single_flow_gso_in_control)
+{
+	char test_str[37] = {// Header, conn id and pkt num
+			     0x40, 0x08, 0x6B, 0xBF, 0x88, 0x82, 0xB9, 0x12,
+			     0x49, 0xCA,
+			     // Payload
+			     0x02, 0x80, 0xDE, 0x40, 0x39, 0x40, 0xF6, 0x00,
+			     0x01, 0x0B, 0x00, 0x0F, 0x65, 0x63, 0x68, 0x6F,
+			     0x20, 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36,
+			     0x37, 0x38, 0x39
+	};
+
+	char match_str[53] = {
+			     0x46, 0x08, 0x6B, 0xBF, 0x88, 0x82, 0xB9, 0x12,
+			     0x49, 0x1C, 0x44, 0xB8, 0x41, 0xBB, 0xCF, 0x6E,
+			     0x0A, 0x2A, 0x24, 0xFB, 0xB4, 0x79, 0x62, 0xEA,
+			     0x59, 0x38, 0x1A, 0x0E, 0x50, 0x1E, 0x59, 0xED,
+			     0x3F, 0x8E, 0x7E, 0x5A, 0x70, 0xE4, 0x2A, 0xBC,
+			     0x2A, 0xFA, 0x2B, 0x54, 0xEB, 0x89, 0xC3, 0x2C,
+			     0xB6, 0x8C, 0x1E, 0xAB, 0x2D
+	};
+
+	size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
+	uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)
+			 + CMSG_SPACE(sizeof(uint16_t))];
+	struct quic_tx_ancillary_data *anc_data;
+	struct quic_connection_info conn_info;
+	int send_len = sizeof(test_str);
+	int msg_len = sizeof(test_str);
+	uint16_t frag_size = 1200;
+	struct cmsghdr *cmsg_hdr;
+	int wrong_frag_size = 26;
+	socklen_t recv_addr_len;
+	struct iovec iov[2];
+	struct msghdr msg;
+	char *buf;
+
+	buf = (char *)malloc(1024);
+	conn_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_info.key.conn_id_length = variant->conn_id_len;
+	memcpy(conn_info.key.conn_id,
+	       &variant->conn_id,
+	       variant->conn_id_len);
+	memcpy(&conn_info.crypto_info_aes_gcm_128.packet_encryption_key,
+	       &variant->conn_key, 16);
+	memcpy(&conn_info.crypto_info_aes_gcm_128.packet_encryption_iv,
+	       &variant->conn_iv, 12);
+	memcpy(&conn_info.crypto_info_aes_gcm_128.header_encryption_key,
+	       &variant->conn_hdr_key,
+	       16);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &wrong_frag_size,
+			     sizeof(wrong_frag_size)), 0);
+
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION,
+			     &conn_info, sizeof(conn_info)), 0);
+
+	recv_addr_len = self->len_c;
+
+	iov[0].iov_base = test_str;
+	iov[0].iov_len = msg_len;
+
+	memset(cmsg_buf, 0, sizeof(cmsg_buf));
+	msg.msg_name = (self->client.addr.sin_family == AF_INET)
+		       ? (void *)&self->client.addr
+		       : (void *)&self->client.addr6;
+	msg.msg_namelen = self->len_c;
+	msg.msg_iov = iov;
+	msg.msg_iovlen = 1;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->flags = 0;
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->conn_id_length = variant->conn_id_len;
+
+	cmsg_hdr = CMSG_NXTHDR(&msg, cmsg_hdr);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_SEGMENT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(sizeof(uint16_t));
+	memcpy(CMSG_DATA(cmsg_hdr), (void *)&frag_size, sizeof(frag_size));
+
+	EXPECT_EQ(sendmsg(self->sfd, &msg, 0), send_len);
+	ASSERT_NE(setns(self->client_net_ns_fd, 0), -1);
+	if (variant->af_client == AF_INET) {
+		EXPECT_EQ(recvfrom(self->cfd, buf, 1024, 0,
+				   &self->client.addr, &recv_addr_len),
+			  sizeof(match_str));
+	} else {
+		EXPECT_EQ(recvfrom(self->cfd, buf, 9000, 0,
+				   &self->client.addr6, &recv_addr_len),
+			  sizeof(match_str));
+	}
+	EXPECT_STREQ(buf, match_str);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION,
+			     &conn_info, sizeof(conn_info)), 0);
+	free(buf);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+TEST_F(quic_crypto, encrypt_test_vector_aesgcm128_single_flow_gso_in_setsockopt)
+{
+	char test_str[37] = {// Header, conn id and pkt num
+			     0x40, 0x08, 0x6B, 0xBF, 0x88, 0x82, 0xB9, 0x12,
+			     0x49, 0xCA,
+			     // Payload
+			     0x02, 0x80, 0xDE, 0x40, 0x39, 0x40, 0xF6, 0x00,
+			     0x01, 0x0B, 0x00, 0x0F, 0x65, 0x63, 0x68, 0x6F,
+			     0x20, 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36,
+			     0x37, 0x38, 0x39
+	};
+
+	char match_str[53] = {
+			     0x46, 0x08, 0x6B, 0xBF, 0x88, 0x82, 0xB9, 0x12,
+			     0x49, 0x1C, 0x44, 0xB8, 0x41, 0xBB, 0xCF, 0x6E,
+			     0x0A, 0x2A, 0x24, 0xFB, 0xB4, 0x79, 0x62, 0xEA,
+			     0x59, 0x38, 0x1A, 0x0E, 0x50, 0x1E, 0x59, 0xED,
+			     0x3F, 0x8E, 0x7E, 0x5A, 0x70, 0xE4, 0x2A, 0xBC,
+			     0x2A, 0xFA, 0x2B, 0x54, 0xEB, 0x89, 0xC3, 0x2C,
+			     0xB6, 0x8C, 0x1E, 0xAB, 0x2D
+	};
+
+	size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
+	uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)];
+	struct quic_tx_ancillary_data *anc_data;
+	struct quic_connection_info conn_info;
+	int send_len = sizeof(test_str);
+	int msg_len = sizeof(test_str);
+	struct cmsghdr *cmsg_hdr;
+	socklen_t recv_addr_len;
+	int frag_size = 1200;
+	struct iovec iov[2];
+	struct msghdr msg;
+	char *buf;
+
+	buf = (char *)malloc(1024);
+
+	conn_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_info.key.conn_id_length = variant->conn_id_len;
+	memcpy(&conn_info.key.conn_id,
+	       &variant->conn_id,
+	       variant->conn_id_len);
+	memcpy(&conn_info.crypto_info_aes_gcm_128.packet_encryption_key,
+	       &variant->conn_key, 16);
+	memcpy(&conn_info.crypto_info_aes_gcm_128.packet_encryption_iv,
+	       &variant->conn_iv, 12);
+	memcpy(&conn_info.crypto_info_aes_gcm_128.header_encryption_key,
+	       &variant->conn_hdr_key,
+	       16);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
+			     sizeof(frag_size)),
+		  0);
+
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION,
+			     &conn_info, sizeof(conn_info)),
+		  0);
+
+	recv_addr_len = self->len_c;
+
+	iov[0].iov_base = test_str;
+	iov[0].iov_len = msg_len;
+
+	msg.msg_name = (self->client.addr.sin_family == AF_INET)
+		       ? (void *)&self->client.addr
+		       : (void *)&self->client.addr6;
+	msg.msg_namelen = self->len_c;
+	msg.msg_iov = iov;
+	msg.msg_iovlen = 1;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->flags = 0;
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->conn_id_length = variant->conn_id_len;
+
+	EXPECT_EQ(sendmsg(self->sfd, &msg, 0), send_len);
+	ASSERT_NE(setns(self->client_net_ns_fd, 0), -1);
+	if (variant->af_client == AF_INET) {
+		EXPECT_EQ(recvfrom(self->cfd, buf, 1024, 0,
+				   &self->client.addr, &recv_addr_len),
+			  sizeof(match_str));
+	} else {
+		EXPECT_EQ(recvfrom(self->cfd, buf, 9000, 0,
+				   &self->client.addr6, &recv_addr_len),
+			  sizeof(match_str));
+	}
+	EXPECT_STREQ(buf, match_str);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION,
+			     &conn_info, sizeof(conn_info)), 0);
+	free(buf);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/net/quic.sh b/tools/testing/selftests/net/quic.sh
new file mode 100755
index 000000000000..6c684e670e82
--- /dev/null
+++ b/tools/testing/selftests/net/quic.sh
@@ -0,0 +1,45 @@
+#!/bin/bash
+
+sudo ip netns add ns11
+sudo ip netns add ns12
+sudo ip netns add ns2
+sudo ip link add veth11 type veth peer name br-veth11
+sudo ip link add veth12 type veth peer name br-veth12
+sudo ip link add veth2 type veth peer name br-veth2
+sudo ip link set veth11 netns ns11
+sudo ip link set veth12 netns ns12
+sudo ip link set veth2 netns ns2
+sudo ip netns exec ns11 ip addr add 10.0.0.1/24 dev veth11
+sudo ip netns exec ns11 ip addr add ::ffff:10.0.0.1/96 dev veth11
+sudo ip netns exec ns11 ip addr add 2001::1/64 dev veth11
+sudo ip netns exec ns12 ip addr add 10.0.0.3/24 dev veth12
+sudo ip netns exec ns12 ip addr add ::ffff:10.0.0.3/96 dev veth12
+sudo ip netns exec ns12 ip addr add 2001::3/64 dev veth12
+sudo ip netns exec ns2 ip addr add 10.0.0.2/24 dev veth2
+sudo ip netns exec ns2 ip addr add ::ffff:10.0.0.2/96 dev veth2
+sudo ip netns exec ns2 ip addr add 2001::2/64 dev veth2
+sudo ip link add name br1 type bridge forward_delay 0
+sudo ip link set br1 up
+sudo ip link set br-veth11 up
+sudo ip link set br-veth12 up
+sudo ip link set br-veth2 up
+sudo ip netns exec ns11 ip link set veth11 up
+sudo ip netns exec ns12 ip link set veth12 up
+sudo ip netns exec ns2 ip link set veth2 up
+sudo ip link set br-veth11 master br1
+sudo ip link set br-veth12 master br1
+sudo ip link set br-veth2 master br1
+sudo ip netns exec ns2 cat /proc/net/quic_stat
+
+printf "%s" "Waiting for bridge to start fowarding ..."
+while ! timeout 0.5 sudo ip netns exec ns2 ping -c 1 -n 2001::1 &> /dev/null
+do
+	printf "%c" "."
+done
+printf "\n%s\n"  "Bridge is operational"
+
+sudo ./quic
+sudo ip netns exec ns2 cat /proc/net/quic_stat
+sudo ip netns delete ns2
+sudo ip netns delete ns12
+sudo ip netns delete ns11
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC net-next 0/6] net: support QUIC crypto
       [not found] <Adel Abouchaev <adel.abushaev@gmail.com>
  2022-08-01 19:52 ` [RFC net-next 0/6] net: support QUIC crypto Adel Abouchaev
@ 2022-08-03 16:40 ` Adel Abouchaev
  2022-08-03 16:40   ` [RFC net-next 1/6] net: Documentation on QUIC kernel Tx crypto Adel Abouchaev
                     ` (5 more replies)
  2022-08-06  0:11 ` [RFC net-next v2 0/6] net: support QUIC crypto Adel Abouchaev
                   ` (5 subsequent siblings)
  7 siblings, 6 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-03 16:40 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

QUIC requires end to end encryption of the data. The application usually
prepares the data in clear text, encrypts and calls send() which implies
multiple copies of the data before the packets hit the networking stack.
Similar to kTLS, QUIC kernel offload of cryptography reduces the memory
pressure by reducing the number of copies.

The scope of kernel support is limited to the symmetric cryptography,
leaving the handshake to the user space library. For QUIC in particular,
the application packets that require symmetric cryptography are the 1RTT
packets with short headers. Kernel will encrypt the application packets
on transmission and decrypt on receive. This series implements Tx only,
because in QUIC server applications Tx outweighs Rx by orders of
magnitude.

Supporting the combination of QUIC and GSO requires the application to
correctly place the data and the kernel to correctly slice it. The
encryption process appends an arbitrary number of bytes (tag) to the end
of the message to authenticate it. The GSO value should include this
overhead, the offload would then subtract the tag size to parse the
input on Tx before chunking and encrypting it.

With the kernel cryptography, the buffer copy operation is conjoined
with the encryption operation. The memory bandwidth is reduced by 5-8%.
When devices supporting QUIC encryption in hardware come to the market,
we will be able to free further 7% of CPU utilization which is used
today for crypto operations.


Adel Abouchaev (6):
  Documentation on QUIC kernel Tx crypto.
  Define QUIC specific constants, control and data plane structures
  Add UDP ULP operations, initialization and handling prototype
    functions.
  Implement QUIC offload functions
  Add flow counters and Tx processing error counter
  Add self tests for ULP operations, flow setup and crypto tests

 Documentation/networking/quic.rst      |  176 +++
 include/net/inet_sock.h                |    2 +
 include/net/netns/mib.h                |    3 +
 include/net/quic.h                     |   59 +
 include/net/snmp.h                     |    6 +
 include/net/udp.h                      |   33 +
 include/uapi/linux/quic.h              |   61 +
 include/uapi/linux/snmp.h              |   11 +
 include/uapi/linux/udp.h               |    4 +
 net/Kconfig                            |    1 +
 net/Makefile                           |    1 +
 net/ipv4/Makefile                      |    3 +-
 net/ipv4/udp.c                         |   14 +
 net/ipv4/udp_ulp.c                     |  190 ++++
 net/quic/Kconfig                       |   16 +
 net/quic/Makefile                      |    8 +
 net/quic/quic_main.c                   | 1446 ++++++++++++++++++++++++
 net/quic/quic_proc.c                   |   45 +
 tools/testing/selftests/net/.gitignore |    1 +
 tools/testing/selftests/net/Makefile   |    2 +-
 tools/testing/selftests/net/quic.c     | 1024 +++++++++++++++++
 tools/testing/selftests/net/quic.sh    |   45 +
 22 files changed, 3149 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/networking/quic.rst
 create mode 100644 include/net/quic.h
 create mode 100644 include/uapi/linux/quic.h
 create mode 100644 net/ipv4/udp_ulp.c
 create mode 100644 net/quic/Kconfig
 create mode 100644 net/quic/Makefile
 create mode 100644 net/quic/quic_main.c
 create mode 100644 net/quic/quic_proc.c
 create mode 100644 tools/testing/selftests/net/quic.c
 create mode 100755 tools/testing/selftests/net/quic.sh

-- 
2.30.2


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [RFC net-next 1/6] net: Documentation on QUIC kernel Tx crypto.
  2022-08-03 16:40 ` Adel Abouchaev
@ 2022-08-03 16:40   ` Adel Abouchaev
  2022-08-03 18:23     ` Andrew Lunn
  2022-08-04 13:57     ` Jonathan Corbet
  2022-08-03 16:40   ` [RFC net-next 2/6] net: Define QUIC specific constants, control and data plane structures Adel Abouchaev
                     ` (4 subsequent siblings)
  5 siblings, 2 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-03 16:40 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

Adding Documentation/networking/quic.rst file to describe kernel QUIC
code.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>
---
 Documentation/networking/quic.rst | 176 ++++++++++++++++++++++++++++++
 1 file changed, 176 insertions(+)
 create mode 100644 Documentation/networking/quic.rst

diff --git a/Documentation/networking/quic.rst b/Documentation/networking/quic.rst
new file mode 100644
index 000000000000..eaa2d36310be
--- /dev/null
+++ b/Documentation/networking/quic.rst
@@ -0,0 +1,176 @@
+.. _kernel_quic:
+
+===========
+KERNEL QUIC
+===========
+
+Overview
+========
+
+QUIC is a secure general-purpose transport protocol that creates a stateful
+interaction between a client and a server. QUIC provides end-to-end integrity
+and confidentiality. Refer to RFC 9000 for more information on QUIC.
+
+The kernel Tx side offload covers the encryption of the application streams
+in the kernel rather than in the application. These packets are 1RTT packets
+in QUIC connection. Encryption of every other packets is still done by the
+QUIC library in user space.
+
+
+
+User Interface
+==============
+
+Creating a QUIC connection
+--------------------------
+
+QUIC connection originates and terminates in the application, using one of many
+available QUIC libraries. The code instantiates QUIC client and QUIC server in
+some form and configures them to use certain addresses and ports for the
+source and destination. The client and server negotiate the set of keys to
+protect the communication during different phases of the connection, maintain
+the connection and perform congestion control.
+
+Requesting to add QUIC Tx kernel encryption to the connection
+-------------------------------------------------------------
+
+Each flow that should be encrypted by the kernel needs to be registered with
+the kernel using socket API. A setsockopt() call on the socket creates an
+association between the QUIC connection ID of the flow with the encryption
+parameters for the crypto operations:
+
+.. code_block:: c
+  struct quic_connection_info conn_info;
+  char conn_id[5] = {0x01, 0x02, 0x03, 0x04, 0x05};
+  const size_t conn_id_len = sizeof(conn_id);
+  char conn_key[16] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+                       0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f};
+  char conn_iv[12] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+                      0x08, 0x09, 0x0a, 0x0b};
+  char conn_hdr_key[16] = {0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
+                       0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f};
+
+  conn_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+
+  memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+  conn_info.key.conn_id_length = 5;
+  memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE - conn_id_len],
+         &conn_id, conn_id_len);
+
+  memcpy(&conn_info.payload_key, conn_key, sizeof(conn_key));
+  memcpy(&conn_info.payload_iv, conn_iv, sizeof(conn_iv));
+  memcpy(&conn_info.header_key, conn_hdr_key, sizeof(conn_hdr_key));
+
+  setsockopt(fd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION, &conn_info,
+             sizeof(conn_info));
+
+
+Requesting to remove QUIC Tx kernel crypto offload control messages
+-------------------------------------------------------------------
+
+All flows are removed when the socket is closed. To request an explicit remove
+of the offload for the connection during the lifetime of the socket the process
+is similar to adding the flow. Only the connection ID and its length are 
+necessary to supply to remove the connection from the offload:
+
+.. code_block:: c
+
+  memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+  conn_info.key.conn_id_length = 5;
+  memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE - conn_id_len],
+         &conn_id, conn_id_len);
+  setsockopt(fd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION, &conn_info,
+             sizeof(conn_info));
+
+Sending QUIC application data
+-----------------------------
+
+For QUIC Tx encryption offload, the application should use sendmsg() socket
+call and provide ancillary data with information on connection ID length and
+offload flags for the kernel to perform the encryption and GSO support if
+requested.
+
+.. code_block:: c
+
+  size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
+  uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)];
+  struct quic_tx_ancillary_data * anc_data;
+  size_t quic_data_len = 4500;
+  struct cmsghdr * cmsg_hdr;
+  char quic_data[9000];
+  struct iovec iov[2];
+  int send_len = 9000;
+  struct msghdr msg;
+  int err;
+
+  iov[0].iov_base = quic_data;
+  iov[0].iov_len = quic_data_len;
+  iov[1].iov_base = quic_data + 4500;
+  iov[1].iov_len = quic_data_len;
+
+  if (client.addr.sin_family == AF_INET) {
+    msg.msg_name = &client.addr;
+    msg.msg_namelen = sizeof(client.addr);
+  } else {
+    msg.msg_name = &client.addr6;
+    msg.msg_namelen = sizeof(client.addr6);
+  }
+
+  msg.msg_iov = iov;
+  msg.msg_iovlen = 2;
+  msg.msg_control = cmsg_buf;
+  msg.msg_controllen = sizeof(cmsg_buf);
+  cmsg_hdr = CMSG_FIRSTHDR(&msg);
+  cmsg_hdr->cmsg_level = IPPROTO_UDP;
+  cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+  cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
+  anc_data = CMSG_DATA(cmsg_hdr);
+  anc_data->flags = 0;
+  anc_data->next_pkt_num = 0x0d65c9;
+  anc_data->conn_id_length = conn_id_len;
+  err = sendmsg(self->sfd, &msg, 0);
+
+QUIC Tx offload in kernel will read the data from userspace, encrypt and
+copy it to the ciphertext within the same operation.
+
+
+Sending QUIC application data with GSO
+--------------------------------------
+When GSO is in use, the kernel will use the GSO fragment size as the target
+for ciphertext. The packets from the user space should align on the boundary
+of GSO fragment size minus the size of the tag for the chosen cipher. For the
+GSO fragment 1200, the plain packets should follow each other at every 1184
+bytes, given the tag size of 16. After the encryption, the rest of the UDP
+and IP stacks will follow the defined value of GSO fragment which will include
+the trailing tag bytes.
+
+To set up GSO fragmentation:
+
+.. code_block:: c
+  setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size, sizeof(frag_size));
+
+If the GSO fragment size is provided in ancillary data within the sendmsg()
+call, the value in ancillary data will take precedence over the segment size
+provided in setsockopt to split the payload into packets. This is consistent
+with the UDP stack behavior.
+
+Integrating to userspace QUIC libraries
+---------------------------------------
+
+Userspace QUIC libraries integration would depend on the implementation of the
+QUIC protocol. For MVFST library, the control plane is integrated into the
+handshake callbacks to properly configure the flows into the socket; and the
+data plane is integrated into the methods that perform encryption and send
+the packets to the batch scheduler for transmissions to the socket.
+
+MVFST library can be found at https://github.com/facebookincubator/mvfst.
+
+Statistics
+==========
+
+QUIC Tx offload to the kernel has counters reflected in /proc/net/quic_stat:
+
+  QuicCurrTxSw  - number of currently active kernel offloaded QUIC connections
+  QuicTxSw      - accumulative total number of offloaded QUIC connections
+  QuicTxSwError - accumulative total number of errors during QUIC Tx offload to
+                  kernel
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC net-next 2/6] net: Define QUIC specific constants, control and data plane structures
  2022-08-03 16:40 ` Adel Abouchaev
  2022-08-03 16:40   ` [RFC net-next 1/6] net: Documentation on QUIC kernel Tx crypto Adel Abouchaev
@ 2022-08-03 16:40   ` Adel Abouchaev
  2022-08-03 16:40   ` [RFC net-next 3/6] net: Add UDP ULP operations, initialization and handling prototype functions Adel Abouchaev
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-03 16:40 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

Define control and data plane structures to pass in control plane for
flow add/remove and during packet send within ancillary data. Define
constants to use within SOL_UDP to program QUIC sockets.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>
---
 include/uapi/linux/quic.h | 61 +++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/udp.h  |  3 ++
 2 files changed, 64 insertions(+)
 create mode 100644 include/uapi/linux/quic.h

diff --git a/include/uapi/linux/quic.h b/include/uapi/linux/quic.h
new file mode 100644
index 000000000000..79680b8b18a6
--- /dev/null
+++ b/include/uapi/linux/quic.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) */
+
+#ifndef _UAPI_LINUX_QUIC_H
+#define _UAPI_LINUX_QUIC_H
+
+#include <linux/types.h>
+#include <linux/tls.h>
+
+#define QUIC_MAX_CONNECTION_ID_SIZE 20
+
+/* Side by side data for QUIC egress operations */
+#define QUIC_BYPASS_ENCRYPTION 0x01
+
+struct quic_tx_ancillary_data {
+	__aligned_u64	next_pkt_num;
+	__u8	flags;
+	__u8	conn_id_length;
+};
+
+struct quic_connection_info_key {
+	__u8	conn_id[QUIC_MAX_CONNECTION_ID_SIZE];
+	__u8	conn_id_length;
+};
+
+struct quic_aes_gcm_128 {
+	__u8	header_key[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
+	__u8	payload_key[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
+	__u8	payload_iv[TLS_CIPHER_AES_GCM_128_IV_SIZE];
+};
+
+struct quic_aes_gcm_256 {
+	__u8	header_key[TLS_CIPHER_AES_GCM_256_KEY_SIZE];
+	__u8	payload_key[TLS_CIPHER_AES_GCM_256_KEY_SIZE];
+	__u8	payload_iv[TLS_CIPHER_AES_GCM_256_IV_SIZE];
+};
+
+struct quic_aes_ccm_128 {
+	__u8	header_key[TLS_CIPHER_AES_CCM_128_KEY_SIZE];
+	__u8	payload_key[TLS_CIPHER_AES_CCM_128_KEY_SIZE];
+	__u8	payload_iv[TLS_CIPHER_AES_CCM_128_IV_SIZE];
+};
+
+struct quic_chacha20_poly1305 {
+	__u8	header_key[TLS_CIPHER_CHACHA20_POLY1305_KEY_SIZE];
+	__u8	payload_key[TLS_CIPHER_CHACHA20_POLY1305_KEY_SIZE];
+	__u8	payload_iv[TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE];
+};
+
+struct quic_connection_info {
+	__u16	cipher_type;
+	struct quic_connection_info_key		key;
+	union {
+		struct quic_aes_gcm_128 aes_gcm_128;
+		struct quic_aes_gcm_256 aes_gcm_256;
+		struct quic_aes_ccm_128 aes_ccm_128;
+		struct quic_chacha20_poly1305 chacha20_poly1305;
+	};
+};
+
+#endif
+
diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
index 4828794efcf8..0ee4c598e70b 100644
--- a/include/uapi/linux/udp.h
+++ b/include/uapi/linux/udp.h
@@ -34,6 +34,9 @@ struct udphdr {
 #define UDP_NO_CHECK6_RX 102	/* Disable accpeting checksum for UDP6 */
 #define UDP_SEGMENT	103	/* Set GSO segmentation size */
 #define UDP_GRO		104	/* This socket can receive UDP GRO packets */
+#define UDP_QUIC_ADD_TX_CONNECTION	106 /* Add QUIC Tx crypto offload */
+#define UDP_QUIC_DEL_TX_CONNECTION	107 /* Del QUIC Tx crypto offload */
+#define UDP_QUIC_ENCRYPT		108 /* QUIC encryption parameters */
 
 /* UDP encapsulation types */
 #define UDP_ENCAP_ESPINUDP_NON_IKE	1 /* draft-ietf-ipsec-nat-t-ike-00/01 */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC net-next 3/6] net: Add UDP ULP operations, initialization and handling prototype functions.
  2022-08-03 16:40 ` Adel Abouchaev
  2022-08-03 16:40   ` [RFC net-next 1/6] net: Documentation on QUIC kernel Tx crypto Adel Abouchaev
  2022-08-03 16:40   ` [RFC net-next 2/6] net: Define QUIC specific constants, control and data plane structures Adel Abouchaev
@ 2022-08-03 16:40   ` Adel Abouchaev
  2022-08-03 16:40   ` [RFC net-next 4/6] net: Implement QUIC offload functions Adel Abouchaev
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-03 16:40 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

Define functions to add UDP ULP handling, registration with UDP protocol
and supporting data structures. Create structure for QUIC ULP and add empty
prototype functions to support it.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>
---
 include/net/inet_sock.h  |   2 +
 include/net/udp.h        |  33 +++++++
 include/uapi/linux/udp.h |   1 +
 net/Kconfig              |   1 +
 net/Makefile             |   1 +
 net/ipv4/Makefile        |   3 +-
 net/ipv4/udp.c           |   6 ++
 net/ipv4/udp_ulp.c       | 190 +++++++++++++++++++++++++++++++++++++++
 8 files changed, 236 insertions(+), 1 deletion(-)
 create mode 100644 net/ipv4/udp_ulp.c

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 6395f6b9a5d2..e9c44b3ccffe 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -238,6 +238,8 @@ struct inet_sock {
 	__be32			mc_addr;
 	struct ip_mc_socklist __rcu	*mc_list;
 	struct inet_cork_full	cork;
+	const struct udp_ulp_ops	*udp_ulp_ops;
+	void __rcu		*ulp_data;
 };
 
 #define IPCORK_OPT	1	/* ip-options has been held in ipcork.opt */
diff --git a/include/net/udp.h b/include/net/udp.h
index 8dd4aa1485a6..f50011a20c92 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -523,4 +523,37 @@ struct proto *udp_bpf_get_proto(struct sock *sk, struct sk_psock *psock);
 int udp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore);
 #endif
 
+/*
+ * Interface for adding Upper Level Protocols over UDP
+ */
+
+#define UDP_ULP_NAME_MAX	16
+#define UDP_ULP_MAX		128
+
+struct udp_ulp_ops {
+	struct list_head	list;
+
+	/* initialize ulp */
+	int (*init)(struct sock *sk);
+	/* cleanup ulp */
+	void (*release)(struct sock *sk);
+
+	char		name[UDP_ULP_NAME_MAX];
+	struct module	*owner;
+};
+
+int udp_register_ulp(struct udp_ulp_ops *type);
+void udp_unregister_ulp(struct udp_ulp_ops *type);
+int udp_set_ulp(struct sock *sk, const char *name);
+void udp_get_available_ulp(char *buf, size_t len);
+void udp_cleanup_ulp(struct sock *sk);
+int udp_setsockopt_ulp(struct sock *sk, sockptr_t optval,
+		       unsigned int optlen);
+int udp_getsockopt_ulp(struct sock *sk, char __user *optval,
+		       int __user *optlen);
+
+#define MODULE_ALIAS_UDP_ULP(name)\
+	__MODULE_INFO(alias, alias_userspace, name);\
+	__MODULE_INFO(alias, alias_udp_ulp, "udp-ulp-" name)
+
 #endif	/* _UDP_H */
diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
index 0ee4c598e70b..893691f0108a 100644
--- a/include/uapi/linux/udp.h
+++ b/include/uapi/linux/udp.h
@@ -34,6 +34,7 @@ struct udphdr {
 #define UDP_NO_CHECK6_RX 102	/* Disable accpeting checksum for UDP6 */
 #define UDP_SEGMENT	103	/* Set GSO segmentation size */
 #define UDP_GRO		104	/* This socket can receive UDP GRO packets */
+#define UDP_ULP		105	/* Attach ULP to a UDP socket */
 #define UDP_QUIC_ADD_TX_CONNECTION	106 /* Add QUIC Tx crypto offload */
 #define UDP_QUIC_DEL_TX_CONNECTION	107 /* Del QUIC Tx crypto offload */
 #define UDP_QUIC_ENCRYPT		108 /* QUIC encryption parameters */
diff --git a/net/Kconfig b/net/Kconfig
index 6b78f695caa6..93e3b1308aec 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -63,6 +63,7 @@ menu "Networking options"
 source "net/packet/Kconfig"
 source "net/unix/Kconfig"
 source "net/tls/Kconfig"
+source "net/quic/Kconfig"
 source "net/xfrm/Kconfig"
 source "net/iucv/Kconfig"
 source "net/smc/Kconfig"
diff --git a/net/Makefile b/net/Makefile
index fbfeb8a0bb37..28565bfe29cb 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -16,6 +16,7 @@ obj-y				+= ethernet/ 802/ sched/ netlink/ bpf/ ethtool/
 obj-$(CONFIG_NETFILTER)		+= netfilter/
 obj-$(CONFIG_INET)		+= ipv4/
 obj-$(CONFIG_TLS)		+= tls/
+obj-$(CONFIG_QUIC)		+= quic/
 obj-$(CONFIG_XFRM)		+= xfrm/
 obj-$(CONFIG_UNIX_SCM)		+= unix/
 obj-y				+= ipv6/
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index bbdd9c44f14e..88d3baf4af95 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -14,7 +14,8 @@ obj-y     := route.o inetpeer.o protocol.o \
 	     udp_offload.o arp.o icmp.o devinet.o af_inet.o igmp.o \
 	     fib_frontend.o fib_semantics.o fib_trie.o fib_notifier.o \
 	     inet_fragment.o ping.o ip_tunnel_core.o gre_offload.o \
-	     metrics.o netlink.o nexthop.o udp_tunnel_stub.o
+	     metrics.o netlink.o nexthop.o udp_tunnel_stub.o \
+	     udp_ulp.o
 
 obj-$(CONFIG_BPFILTER) += bpfilter/
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index aa9f2ec3dc46..e4a5f66b3141 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2778,6 +2778,9 @@ int udp_lib_setsockopt(struct sock *sk, int level, int optname,
 		up->pcflag |= UDPLITE_RECV_CC;
 		break;
 
+	case UDP_ULP:
+		return udp_setsockopt_ulp(sk, optval, optlen);
+
 	default:
 		err = -ENOPROTOOPT;
 		break;
@@ -2846,6 +2849,9 @@ int udp_lib_getsockopt(struct sock *sk, int level, int optname,
 		val = up->pcrlen;
 		break;
 
+	case UDP_ULP:
+		return udp_getsockopt_ulp(sk, optval, optlen);
+
 	default:
 		return -ENOPROTOOPT;
 	}
diff --git a/net/ipv4/udp_ulp.c b/net/ipv4/udp_ulp.c
new file mode 100644
index 000000000000..3801ed7ad17d
--- /dev/null
+++ b/net/ipv4/udp_ulp.c
@@ -0,0 +1,190 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Pluggable UDP upper layer protocol support, based on pluggable TCP upper
+ * layer protocol support.
+ *
+ * Copyright (c) 2016-2017, Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2016-2017, Dave Watson <davejwatson@fb.com>. All rights reserved.
+ */
+
+#include <linux/gfp.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/mm.h>
+#include <linux/types.h>
+#include <linux/skmsg.h>
+#include <net/tcp.h>
+#include <net/udp.h>
+
+static DEFINE_SPINLOCK(udp_ulp_list_lock);
+static LIST_HEAD(udp_ulp_list);
+
+/* Simple linear search, don't expect many entries! */
+static struct udp_ulp_ops *udp_ulp_find(const char *name)
+{
+	struct udp_ulp_ops *e;
+
+	list_for_each_entry_rcu(e, &udp_ulp_list, list,
+				lockdep_is_held(&udp_ulp_list_lock)) {
+		if (strcmp(e->name, name) == 0)
+			return e;
+	}
+
+	return NULL;
+}
+
+static const struct udp_ulp_ops *__udp_ulp_find_autoload(const char *name)
+{
+	const struct udp_ulp_ops *ulp = NULL;
+
+	rcu_read_lock();
+	ulp = udp_ulp_find(name);
+
+#ifdef CONFIG_MODULES
+	if (!ulp && capable(CAP_NET_ADMIN)) {
+		rcu_read_unlock();
+		request_module("udp-ulp-%s", name);
+		rcu_read_lock();
+		ulp = udp_ulp_find(name);
+	}
+#endif
+	if (!ulp || !try_module_get(ulp->owner))
+		ulp = NULL;
+
+	rcu_read_unlock();
+	return ulp;
+}
+
+/* Attach new upper layer protocol to the list
+ * of available protocols.
+ */
+int udp_register_ulp(struct udp_ulp_ops *ulp)
+{
+	int ret = 0;
+
+	spin_lock(&udp_ulp_list_lock);
+	if (udp_ulp_find(ulp->name))
+		ret = -EEXIST;
+	else
+		list_add_tail_rcu(&ulp->list, &udp_ulp_list);
+
+	spin_unlock(&udp_ulp_list_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(udp_register_ulp);
+
+void udp_unregister_ulp(struct udp_ulp_ops *ulp)
+{
+	spin_lock(&udp_ulp_list_lock);
+	list_del_rcu(&ulp->list);
+	spin_unlock(&udp_ulp_list_lock);
+
+	synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(udp_unregister_ulp);
+
+void udp_cleanup_ulp(struct sock *sk)
+{
+	struct inet_sock *inet = inet_sk(sk);
+
+	/* No sock_owned_by_me() check here as at the time the
+	 * stack calls this function, the socket is dead and
+	 * about to be destroyed.
+	 */
+	if (!inet->udp_ulp_ops)
+		return;
+
+	if (inet->udp_ulp_ops->release)
+		inet->udp_ulp_ops->release(sk);
+	module_put(inet->udp_ulp_ops->owner);
+
+	inet->udp_ulp_ops = NULL;
+}
+
+static int __udp_set_ulp(struct sock *sk, const struct udp_ulp_ops *ulp_ops)
+{
+	struct inet_sock *inet = inet_sk(sk);
+	int err;
+
+	err = -EEXIST;
+	if (inet->udp_ulp_ops)
+		goto out_err;
+
+	err = ulp_ops->init(sk);
+	if (err)
+		goto out_err;
+
+	inet->udp_ulp_ops = ulp_ops;
+	return 0;
+
+out_err:
+	module_put(ulp_ops->owner);
+	return err;
+}
+
+int udp_set_ulp(struct sock *sk, const char *name)
+{
+	struct sk_psock *psock = sk_psock_get(sk);
+	const struct udp_ulp_ops *ulp_ops;
+
+	if (psock){
+		sk_psock_put(sk, psock);
+		return -EINVAL;
+	}
+
+	sock_owned_by_me(sk);
+	ulp_ops = __udp_ulp_find_autoload(name);
+	if (!ulp_ops)
+		return -ENOENT;
+
+	return __udp_set_ulp(sk, ulp_ops);
+}
+
+int udp_setsockopt_ulp(struct sock *sk, sockptr_t optval, unsigned int optlen)
+{
+	char name[UDP_ULP_NAME_MAX];
+	int val, err;
+
+	if (!optlen || optlen > UDP_ULP_NAME_MAX)
+		return -EINVAL;
+
+	val = strncpy_from_sockptr(name, optval, optlen);
+	if (val < 0)
+		return -EFAULT;
+
+	if (val == UDP_ULP_NAME_MAX)
+		return -EINVAL;
+
+	name[val] = 0;
+	lock_sock(sk);
+	err = udp_set_ulp(sk, name);
+	release_sock(sk);
+	return err;
+}
+
+int udp_getsockopt_ulp(struct sock *sk, char __user *optval, int __user *optlen)
+{
+	struct inet_sock *inet = inet_sk(sk);
+	int len;
+
+	if (get_user(len, optlen))
+		return -EFAULT;
+
+	len = min_t(unsigned int, len, UDP_ULP_NAME_MAX);
+	if (len < 0)
+		return -EINVAL;
+
+	if (!inet->udp_ulp_ops) {
+		if (put_user(0, optlen))
+			return -EFAULT;
+		return 0;
+	}
+
+	if (put_user(len, optlen))
+		return -EFAULT;
+	if (copy_to_user(optval, inet->udp_ulp_ops->name, len))
+		return -EFAULT;
+
+	return 0;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC net-next 4/6] net: Implement QUIC offload functions
  2022-08-03 16:40 ` Adel Abouchaev
                     ` (2 preceding siblings ...)
  2022-08-03 16:40   ` [RFC net-next 3/6] net: Add UDP ULP operations, initialization and handling prototype functions Adel Abouchaev
@ 2022-08-03 16:40   ` Adel Abouchaev
  2022-08-03 16:40   ` [RFC net-next 5/6] net: Add flow counters and Tx processing error counter Adel Abouchaev
  2022-08-03 16:40   ` [RFC net-next 6/6] net: Add self tests for ULP operations, flow setup and crypto tests Adel Abouchaev
  5 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-03 16:40 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

Add connection hash to the context do support add, remove operations
on QUIC connections for the control plane and lookup for the data
plane. Implement setsockopt and add placeholders to add and delete Tx
connections.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>
---
 include/net/quic.h   |   49 ++
 net/ipv4/udp.c       |    8 +
 net/quic/Kconfig     |   16 +
 net/quic/Makefile    |    8 +
 net/quic/quic_main.c | 1400 ++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 1481 insertions(+)
 create mode 100644 include/net/quic.h
 create mode 100644 net/quic/Kconfig
 create mode 100644 net/quic/Makefile
 create mode 100644 net/quic/quic_main.c

diff --git a/include/net/quic.h b/include/net/quic.h
new file mode 100644
index 000000000000..15e04ea08c53
--- /dev/null
+++ b/include/net/quic.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifndef INCLUDE_NET_QUIC_H
+#define INCLUDE_NET_QUIC_H
+
+#include <linux/mutex.h>
+#include <linux/rhashtable.h>
+#include <linux/skmsg.h>
+#include <uapi/linux/quic.h>
+
+#define QUIC_MAX_SHORT_HEADER_SIZE      25
+#define QUIC_MAX_CONNECTION_ID_SIZE     20
+#define QUIC_HDR_MASK_SIZE              16
+#define QUIC_MAX_GSO_FRAGS              16
+
+// Maximum IV and nonce sizes should be in sync with supported ciphers.
+#define QUIC_CIPHER_MAX_IV_SIZE		12
+#define QUIC_CIPHER_MAX_NONCE_SIZE	16
+
+/* Side by side data for QUIC egress operations */
+#define QUIC_ANCILLARY_FLAGS    (QUIC_BYPASS_ENCRYPTION)
+
+#define QUIC_MAX_IOVEC_SEGMENTS		8
+#define QUIC_MAX_SG_ALLOC_ELEMENTS	32
+#define QUIC_MAX_PLAIN_PAGES		16
+#define QUIC_MAX_CIPHER_PAGES_ORDER	4
+
+struct quic_internal_crypto_context {
+	struct quic_connection_info	conn_info;
+	struct crypto_skcipher		*header_tfm;
+	struct crypto_aead		*packet_aead;
+};
+
+struct quic_connection_rhash {
+	struct rhash_head			node;
+	struct quic_internal_crypto_context	crypto_ctx;
+	struct rcu_head				rcu;
+};
+
+struct quic_context {
+	struct proto		*sk_proto;
+	struct rhashtable	tx_connections;
+	struct scatterlist	sg_alloc[QUIC_MAX_SG_ALLOC_ELEMENTS];
+	struct page		*cipher_page;
+	struct mutex		sendmsg_mux;
+	struct rcu_head		rcu;
+};
+
+#endif
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index e4a5f66b3141..d14379b78e42 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -113,6 +113,7 @@
 #include <net/sock_reuseport.h>
 #include <net/addrconf.h>
 #include <net/udp_tunnel.h>
+#include <uapi/linux/quic.h>
 #if IS_ENABLED(CONFIG_IPV6)
 #include <net/ipv6_stubs.h>
 #endif
@@ -1009,6 +1010,13 @@ static int __udp_cmsg_send(struct cmsghdr *cmsg, u16 *gso_size)
 			return -EINVAL;
 		*gso_size = *(__u16 *)CMSG_DATA(cmsg);
 		return 0;
+	case UDP_QUIC_ENCRYPT:
+		/* This option is handled in UDP_ULP and is only checked
+		 * here for the bypass bit
+		 */
+		if (cmsg->cmsg_len != CMSG_LEN(sizeof(struct quic_tx_ancillary_data)))
+			return -EINVAL;
+		return 0;
 	default:
 		return -EINVAL;
 	}
diff --git a/net/quic/Kconfig b/net/quic/Kconfig
new file mode 100644
index 000000000000..661cb989508a
--- /dev/null
+++ b/net/quic/Kconfig
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# QUIC configuration
+#
+config QUIC
+	tristate "QUIC encryption offload"
+	depends on INET
+	select CRYPTO
+	select CRYPTO_AES
+	select CRYPTO_GCM
+	help
+	Enable kernel support for QUIC crypto offload. Currently only TX
+	encryption offload is supported. The kernel will perform
+	copy-during-encryption.
+
+	If unsure, say N.
diff --git a/net/quic/Makefile b/net/quic/Makefile
new file mode 100644
index 000000000000..928239c4d08c
--- /dev/null
+++ b/net/quic/Makefile
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for the QUIC subsystem
+#
+
+obj-$(CONFIG_QUIC) += quic.o
+
+quic-y := quic_main.o
diff --git a/net/quic/quic_main.c b/net/quic/quic_main.c
new file mode 100644
index 000000000000..e738c8130a4f
--- /dev/null
+++ b/net/quic/quic_main.c
@@ -0,0 +1,1400 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <crypto/skcipher.h>
+#include <linux/bug.h>
+#include <linux/module.h>
+#include <linux/rhashtable.h>
+// Include header to use TLS constants for AEAD cipher.
+#include <net/tls.h>
+#include <net/quic.h>
+#include <net/udp.h>
+#include <uapi/linux/quic.h>
+
+static unsigned long af_init_done;
+static struct proto quic_v4_proto;
+static struct proto quic_v6_proto;
+static DEFINE_SPINLOCK(quic_proto_lock);
+
+static u32 quic_tx_connection_hash(const void *data, u32 len, u32 seed)
+{
+	return jhash(data, len, seed);
+}
+
+static u32 quic_tx_connection_hash_obj(const void *data, u32 len, u32 seed)
+{
+	const struct quic_connection_rhash *connhash = data;
+
+	return jhash(&connhash->crypto_ctx.conn_info.key,
+		     sizeof(struct quic_connection_info_key), seed);
+}
+
+static int quic_tx_connection_hash_cmp(struct rhashtable_compare_arg *arg,
+				       const void *ptr)
+{
+	const struct quic_connection_info_key *key = arg->key;
+	const struct quic_connection_rhash *x = ptr;
+
+	return !!memcmp(&x->crypto_ctx.conn_info.key,
+			key,
+			sizeof(struct quic_connection_info_key));
+}
+
+static const struct rhashtable_params quic_tx_connection_params = {
+	.key_len		= sizeof(struct quic_connection_info_key),
+	.head_offset		= offsetof(struct quic_connection_rhash, node),
+	.hashfn			= quic_tx_connection_hash,
+	.obj_hashfn		= quic_tx_connection_hash_obj,
+	.obj_cmpfn		= quic_tx_connection_hash_cmp,
+	.automatic_shrinking	= true,
+};
+
+static inline size_t quic_crypto_key_size(u16 cipher_type)
+{
+	switch (cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		return TLS_CIPHER_AES_GCM_128_KEY_SIZE;
+	case TLS_CIPHER_AES_GCM_256:
+		return TLS_CIPHER_AES_GCM_256_KEY_SIZE;
+	case TLS_CIPHER_AES_CCM_128:
+		return TLS_CIPHER_AES_CCM_128_KEY_SIZE;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		return TLS_CIPHER_CHACHA20_POLY1305_KEY_SIZE;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return 0;
+}
+
+static inline size_t quic_crypto_tag_size(u16 cipher_type)
+{
+	switch (cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		return TLS_CIPHER_AES_GCM_128_TAG_SIZE;
+	case TLS_CIPHER_AES_GCM_256:
+		return TLS_CIPHER_AES_GCM_256_TAG_SIZE;
+	case TLS_CIPHER_AES_CCM_128:
+		return TLS_CIPHER_AES_CCM_128_TAG_SIZE;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		return TLS_CIPHER_CHACHA20_POLY1305_TAG_SIZE;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return 0;
+}
+
+static inline size_t quic_crypto_iv_size(u16 cipher_type)
+{
+	switch (cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		BUILD_BUG_ON(TLS_CIPHER_AES_GCM_128_IV_SIZE
+			     > QUIC_CIPHER_MAX_IV_SIZE);
+		return TLS_CIPHER_AES_GCM_128_IV_SIZE;
+	case TLS_CIPHER_AES_GCM_256:
+		BUILD_BUG_ON(TLS_CIPHER_AES_GCM_256_IV_SIZE
+			     > QUIC_CIPHER_MAX_IV_SIZE);
+		return TLS_CIPHER_AES_GCM_256_IV_SIZE;
+	case TLS_CIPHER_AES_CCM_128:
+		BUILD_BUG_ON(TLS_CIPHER_AES_CCM_128_IV_SIZE
+			     > QUIC_CIPHER_MAX_IV_SIZE);
+		return TLS_CIPHER_AES_CCM_128_IV_SIZE;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		BUILD_BUG_ON(TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE
+			     > QUIC_CIPHER_MAX_IV_SIZE);
+		return TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return 0;
+}
+
+static inline size_t quic_crypto_nonce_size(u16 cipher_type)
+{
+	switch (cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		BUILD_BUG_ON(TLS_CIPHER_AES_GCM_128_IV_SIZE +
+			     TLS_CIPHER_AES_GCM_128_SALT_SIZE >
+			     QUIC_CIPHER_MAX_NONCE_SIZE);
+		return TLS_CIPHER_AES_GCM_128_IV_SIZE +
+		       TLS_CIPHER_AES_GCM_128_SALT_SIZE;
+	case TLS_CIPHER_AES_GCM_256:
+		BUILD_BUG_ON(TLS_CIPHER_AES_GCM_256_IV_SIZE +
+			     TLS_CIPHER_AES_GCM_256_SALT_SIZE >
+			     QUIC_CIPHER_MAX_NONCE_SIZE);
+		return TLS_CIPHER_AES_GCM_256_IV_SIZE +
+		       TLS_CIPHER_AES_GCM_256_SALT_SIZE;
+	case TLS_CIPHER_AES_CCM_128:
+		BUILD_BUG_ON(TLS_CIPHER_AES_CCM_128_IV_SIZE +
+			     TLS_CIPHER_AES_CCM_128_SALT_SIZE >
+			     QUIC_CIPHER_MAX_NONCE_SIZE);
+		return TLS_CIPHER_AES_CCM_128_IV_SIZE +
+		       TLS_CIPHER_AES_CCM_128_SALT_SIZE;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		BUILD_BUG_ON(TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE +
+			     TLS_CIPHER_CHACHA20_POLY1305_SALT_SIZE >
+			     QUIC_CIPHER_MAX_NONCE_SIZE);
+		return TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE +
+		       TLS_CIPHER_CHACHA20_POLY1305_SALT_SIZE;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return 0;
+}
+
+static inline
+u8 *quic_payload_iv(struct quic_internal_crypto_context *crypto_ctx)
+{
+	switch (crypto_ctx->conn_info.cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		return crypto_ctx->conn_info.aes_gcm_128.payload_iv;
+	case TLS_CIPHER_AES_GCM_256:
+		return crypto_ctx->conn_info.aes_gcm_256.payload_iv;
+	case TLS_CIPHER_AES_CCM_128:
+		return crypto_ctx->conn_info.aes_ccm_128.payload_iv;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		return crypto_ctx->conn_info.chacha20_poly1305.payload_iv;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return 0;
+}
+
+static int
+quic_config_header_crypto(struct quic_internal_crypto_context *crypto_ctx)
+{
+	struct crypto_skcipher *tfm;
+	char *header_cipher;
+	int rc = 0;
+	char *key;
+
+	switch (crypto_ctx->conn_info.cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		header_cipher = "ecb(aes)";
+		key = crypto_ctx->conn_info.aes_gcm_128.header_key;
+		break;
+	case TLS_CIPHER_AES_GCM_256:
+		header_cipher = "ecb(aes)";
+		key = crypto_ctx->conn_info.aes_gcm_256.header_key;
+		break;
+	case TLS_CIPHER_AES_CCM_128:
+		header_cipher = "ecb(aes)";
+		key = crypto_ctx->conn_info.aes_ccm_128.header_key;
+		break;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		header_cipher = "chacha20";
+		key = crypto_ctx->conn_info.chacha20_poly1305.header_key;
+		break;
+	default:
+		rc = -EINVAL;
+		goto out;
+	}
+
+	tfm = crypto_alloc_skcipher(header_cipher, 0, 0);
+	if (IS_ERR(tfm)) {
+		rc = PTR_ERR(tfm);
+		goto out;
+	}
+
+	rc = crypto_skcipher_setkey(tfm, key,
+				    quic_crypto_key_size(crypto_ctx->conn_info
+							 .cipher_type));
+	if (rc) {
+		crypto_free_skcipher(tfm);
+		goto out;
+	}
+
+	crypto_ctx->header_tfm = tfm;
+
+out:
+	return rc;
+}
+
+static int
+quic_config_packet_crypto(struct quic_internal_crypto_context *crypto_ctx)
+{
+	struct crypto_aead *aead;
+	char *cipher_name;
+	int rc = 0;
+	char *key;
+
+	switch (crypto_ctx->conn_info.cipher_type) {
+	case TLS_CIPHER_AES_GCM_128: {
+		key = crypto_ctx->conn_info.aes_gcm_128.payload_key;
+		cipher_name = "gcm(aes)";
+		break;
+	}
+	case TLS_CIPHER_AES_GCM_256: {
+		key = crypto_ctx->conn_info.aes_gcm_256.payload_key;
+		cipher_name = "gcm(aes)";
+		break;
+	}
+	case TLS_CIPHER_AES_CCM_128: {
+		key = crypto_ctx->conn_info.aes_ccm_128.payload_key;
+		cipher_name = "ccm(aes)";
+		break;
+	}
+	case TLS_CIPHER_CHACHA20_POLY1305: {
+		key = crypto_ctx->conn_info.chacha20_poly1305.payload_key;
+		cipher_name = "rfc7539(chacha20,poly1305)";
+		break;
+	}
+	default:
+		rc = -EINVAL;
+		goto out;
+	}
+
+	aead = crypto_alloc_aead(cipher_name, 0, 0);
+	if (IS_ERR(aead)) {
+		rc = PTR_ERR(aead);
+		goto out;
+	}
+
+	rc = crypto_aead_setkey(aead, key,
+				quic_crypto_key_size(crypto_ctx->conn_info
+						     .cipher_type));
+	if (rc)
+		goto free_aead;
+
+	rc = crypto_aead_setauthsize(aead,
+				     quic_crypto_tag_size(crypto_ctx->conn_info
+							  .cipher_type));
+	if (rc)
+		goto free_aead;
+
+	crypto_ctx->packet_aead = aead;
+	goto out;
+
+free_aead:
+	crypto_free_aead(aead);
+
+out:
+	return rc;
+}
+
+static inline struct quic_context *quic_get_ctx(struct sock *sk)
+{
+	struct inet_sock *inet = inet_sk(sk);
+
+	return (__force void *)rcu_access_pointer(inet->ulp_data);
+}
+
+static void quic_free_cipher_page(struct page *page)
+{
+	__free_pages(page, QUIC_MAX_CIPHER_PAGES_ORDER);
+}
+
+static struct quic_context *quic_ctx_create(void)
+{
+	struct quic_context *ctx;
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return NULL;
+
+	mutex_init(&ctx->sendmsg_mux);
+	ctx->cipher_page = alloc_pages(GFP_KERNEL, QUIC_MAX_CIPHER_PAGES_ORDER);
+	if (!ctx->cipher_page)
+		goto out_err;
+
+	if (rhashtable_init(&ctx->tx_connections,
+			    &quic_tx_connection_params) < 0) {
+		quic_free_cipher_page(ctx->cipher_page);
+		goto out_err;
+	}
+
+	return ctx;
+
+out_err:
+	kfree(ctx);
+	return NULL;
+}
+
+static int quic_getsockopt(struct sock *sk, int level, int optname,
+			   char __user *optval, int __user *optlen)
+{
+	struct quic_context *ctx = quic_get_ctx(sk);
+
+	return ctx->sk_proto->getsockopt(sk, level, optname, optval, optlen);
+}
+
+static int do_quic_conn_add_tx(struct sock *sk, sockptr_t optval,
+			       unsigned int optlen)
+{
+	struct quic_internal_crypto_context *crypto_ctx;
+	struct quic_context *ctx = quic_get_ctx(sk);
+	struct quic_connection_rhash *connhash;
+	int rc = 0;
+
+	if (sockptr_is_null(optval))
+		return -EINVAL;
+
+	if (optlen != sizeof(struct quic_connection_info))
+		return -EINVAL;
+
+	connhash = kzalloc(sizeof(*connhash), GFP_KERNEL);
+	if (!connhash)
+		return -EFAULT;
+
+	crypto_ctx = &connhash->crypto_ctx;
+	rc = copy_from_sockptr(&crypto_ctx->conn_info, optval,
+			       sizeof(crypto_ctx->conn_info));
+	if (rc) {
+		rc = -EFAULT;
+		goto err_crypto_info;
+	}
+
+	// create all TLS materials for packet and header decryption
+	rc = quic_config_header_crypto(crypto_ctx);
+	if (rc)
+		goto err_crypto_info;
+
+	rc = quic_config_packet_crypto(crypto_ctx);
+	if (rc)
+		goto err_free_skcipher;
+
+	// insert crypto data into hash per connection ID
+	rc = rhashtable_insert_fast(&ctx->tx_connections, &connhash->node,
+				    quic_tx_connection_params);
+	if (rc < 0)
+		goto err_free_ciphers;
+
+	return 0;
+
+err_free_ciphers:
+	crypto_free_aead(crypto_ctx->packet_aead);
+
+err_free_skcipher:
+	crypto_free_skcipher(crypto_ctx->header_tfm);
+
+err_crypto_info:
+	// wipeout all crypto materials;
+	memzero_explicit(&connhash->crypto_ctx, sizeof(connhash->crypto_ctx));
+	kfree(connhash);
+	return rc;
+}
+
+static int do_quic_conn_del_tx(struct sock *sk, sockptr_t optval,
+			       unsigned int optlen)
+{
+	struct quic_internal_crypto_context *crypto_ctx;
+	struct quic_context *ctx = quic_get_ctx(sk);
+	struct quic_connection_rhash *connhash;
+	struct quic_connection_info conn_info;
+
+	if (sockptr_is_null(optval))
+		return -EINVAL;
+
+	if (optlen != sizeof(struct quic_connection_info))
+		return -EINVAL;
+
+	if (copy_from_sockptr(&conn_info, optval, optlen))
+		return -EFAULT;
+
+	connhash = rhashtable_lookup_fast(&ctx->tx_connections,
+					  &conn_info.key,
+					  quic_tx_connection_params);
+	if (!connhash)
+		return -EINVAL;
+
+	rhashtable_remove_fast(&ctx->tx_connections,
+			       &connhash->node,
+			       quic_tx_connection_params);
+
+
+	crypto_ctx = &connhash->crypto_ctx;
+
+	crypto_free_skcipher(crypto_ctx->header_tfm);
+	crypto_free_aead(crypto_ctx->packet_aead);
+	memzero_explicit(crypto_ctx, sizeof(*crypto_ctx));
+	kfree(connhash);
+
+	return 0;
+}
+
+static int do_quic_setsockopt(struct sock *sk, int optname, sockptr_t optval,
+			      unsigned int optlen)
+{
+	int rc = 0;
+
+	switch (optname) {
+	case UDP_QUIC_ADD_TX_CONNECTION:
+		lock_sock(sk);
+		rc = do_quic_conn_add_tx(sk, optval, optlen);
+		release_sock(sk);
+		break;
+	case UDP_QUIC_DEL_TX_CONNECTION:
+		lock_sock(sk);
+		rc = do_quic_conn_del_tx(sk, optval, optlen);
+		release_sock(sk);
+		break;
+	default:
+		rc = -ENOPROTOOPT;
+		break;
+	}
+
+	return rc;
+}
+
+static int quic_setsockopt(struct sock *sk, int level, int optname,
+			   sockptr_t optval, unsigned int optlen)
+{
+	struct quic_context *ctx;
+	struct proto *sk_proto;
+
+	rcu_read_lock();
+	ctx = quic_get_ctx(sk);
+	sk_proto = ctx->sk_proto;
+	rcu_read_unlock();
+
+	if (level == SOL_UDP &&
+	    (optname == UDP_QUIC_ADD_TX_CONNECTION ||
+	     optname == UDP_QUIC_DEL_TX_CONNECTION))
+		return do_quic_setsockopt(sk, optname, optval, optlen);
+
+	return sk_proto->setsockopt(sk, level, optname, optval, optlen);
+}
+
+static int
+quic_extract_ancillary_data(struct msghdr *msg,
+			    struct quic_tx_ancillary_data *ancillary_data,
+			    u16 *udp_pkt_size)
+{
+	struct cmsghdr *cmsg_hdr = 0;
+	void *ancillary_data_ptr = 0;
+
+	if (!msg->msg_controllen)
+		return -EINVAL;
+
+	for_each_cmsghdr(cmsg_hdr, msg) {
+		if (!CMSG_OK(msg, cmsg_hdr))
+			return -EINVAL;
+
+		if (cmsg_hdr->cmsg_level != IPPROTO_UDP)
+			continue;
+
+		if (cmsg_hdr->cmsg_type == UDP_QUIC_ENCRYPT) {
+			if (cmsg_hdr->cmsg_len !=
+			    CMSG_LEN(sizeof(struct quic_tx_ancillary_data)))
+				return -EINVAL;
+			memcpy((void *)ancillary_data, CMSG_DATA(cmsg_hdr),
+			       sizeof(struct quic_tx_ancillary_data));
+			ancillary_data_ptr = cmsg_hdr;
+		} else if (cmsg_hdr->cmsg_type == UDP_SEGMENT) {
+			if (cmsg_hdr->cmsg_len != CMSG_LEN(sizeof(u16)))
+				return -EINVAL;
+			memcpy((void *)udp_pkt_size, CMSG_DATA(cmsg_hdr),
+			       sizeof(u16));
+		}
+	}
+
+	if (!ancillary_data_ptr)
+		return -EINVAL;
+
+	return 0;
+}
+
+static int quic_sendmsg_validate(struct msghdr *msg)
+{
+	if (!iter_is_iovec(&msg->msg_iter))
+		return -EINVAL;
+
+	if (!msg->msg_controllen)
+		return -EINVAL;
+
+	return 0;
+}
+
+static struct quic_connection_rhash
+*quic_lookup_connection(struct quic_context *ctx,
+			u8 *conn_id,
+			struct quic_tx_ancillary_data *ancillary_data)
+{
+	struct quic_connection_info_key conn_key;
+
+	// Lookup connection information by the connection key.
+	memset(&conn_key, 0, sizeof(struct quic_connection_info_key));
+	// fill the connection id up to the max connection ID length
+	if (ancillary_data->conn_id_length > QUIC_MAX_CONNECTION_ID_SIZE)
+		return NULL;
+
+	conn_key.conn_id_length = ancillary_data->conn_id_length;
+	if (ancillary_data->conn_id_length)
+		memcpy(conn_key.conn_id,
+		       conn_id,
+		       ancillary_data->conn_id_length);
+	return rhashtable_lookup_fast(&ctx->tx_connections,
+				      &conn_key,
+				      quic_tx_connection_params);
+}
+
+static int quic_sg_capacity_from_msg(const size_t pkt_size,
+				     const off_t offset,
+				     const size_t length)
+{
+	size_t	pages = 0;
+	size_t	pkts = 0;
+
+	pages = DIV_ROUND_UP(offset + length, PAGE_SIZE);
+	pkts = DIV_ROUND_UP(length, pkt_size);
+	return pages + pkts + 1;
+}
+
+static void quic_put_plain_user_pages(struct page **pages, size_t nr_pages)
+{
+	int i;
+
+	for (i = 0; i < nr_pages; ++i)
+		if (i == 0 || pages[i] != pages[i - 1])
+			put_page(pages[i]);
+}
+
+static int quic_get_plain_user_pages(struct msghdr * const msg,
+				     struct page **pages,
+				     int *page_indices)
+{
+	size_t	nr_mapped = 0;
+	size_t	nr_pages = 0;
+	void	*data_addr;
+	void	*page_addr;
+	size_t	count = 0;
+	off_t	data_off;
+	int	ret = 0;
+	int	i;
+
+	for (i = 0; i < msg->msg_iter.nr_segs; ++i) {
+		data_addr = msg->msg_iter.iov[i].iov_base;
+		if (!i)
+			data_addr += msg->msg_iter.iov_offset;
+		page_addr =
+			(void *)((unsigned long)data_addr & PAGE_MASK);
+
+		data_off = (unsigned long)data_addr & ~PAGE_MASK;
+		nr_pages =
+			DIV_ROUND_UP(data_off + msg->msg_iter.iov[i].iov_len,
+				     PAGE_SIZE);
+		if (nr_mapped + nr_pages > QUIC_MAX_PLAIN_PAGES) {
+			quic_put_plain_user_pages(pages, nr_mapped);
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		count = get_user_pages((unsigned long)page_addr, nr_pages, 1,
+				       pages, NULL);
+		if (count < nr_pages) {
+			quic_put_plain_user_pages(pages, nr_mapped + count);
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		page_indices[i] = nr_mapped;
+		nr_mapped += count;
+		pages += count;
+	}
+	ret = nr_mapped;
+
+out:
+	return ret;
+}
+
+static int quic_sg_plain_from_mapped_msg(struct msghdr * const msg,
+					 struct page **plain_pages,
+					 void **iov_base_ptrs,
+					 void **iov_data_ptrs,
+					 const size_t plain_size,
+					 const size_t pkt_size,
+					 struct scatterlist * const sg_alloc,
+					 const size_t max_sg_alloc,
+					 struct scatterlist ** const sg_pkts,
+					 size_t *nr_plain_pages)
+{
+	int iov_page_indices[QUIC_MAX_IOVEC_SEGMENTS];
+	struct scatterlist *sg;
+	unsigned int pkt_i = 0;
+	ssize_t left_on_page;
+	size_t pkt_left;
+	unsigned int i;
+	size_t seg_len;
+	off_t page_ofs;
+	off_t seg_ofs;
+	int ret = 0;
+	int page_i;
+
+	if (msg->msg_iter.nr_segs >= QUIC_MAX_IOVEC_SEGMENTS) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ret = quic_get_plain_user_pages(msg, plain_pages, iov_page_indices);
+	if (ret < 0)
+		goto out;
+
+	*nr_plain_pages = ret;
+	sg = sg_alloc;
+	sg_pkts[pkt_i] = sg;
+	sg_unmark_end(sg);
+	pkt_left = pkt_size;
+	for (i = 0; i < msg->msg_iter.nr_segs; ++i) {
+		page_ofs = ((unsigned long)msg->msg_iter.iov[i].iov_base
+			   & (PAGE_SIZE - 1));
+		page_i = 0;
+		if (!i) {
+			page_ofs += msg->msg_iter.iov_offset;
+			while (page_ofs >= PAGE_SIZE) {
+				page_ofs -= PAGE_SIZE;
+				page_i++;
+			}
+		}
+
+		seg_len = msg->msg_iter.iov[i].iov_len;
+		page_i += iov_page_indices[i];
+
+		if (page_i >= QUIC_MAX_PLAIN_PAGES)
+			return -EFAULT;
+
+		seg_ofs = 0;
+		while (seg_ofs < seg_len) {
+			if (sg - sg_alloc > max_sg_alloc)
+				return -EFAULT;
+
+			sg_unmark_end(sg);
+			left_on_page = min_t(size_t, PAGE_SIZE - page_ofs,
+					     seg_len - seg_ofs);
+			if (left_on_page <= 0)
+				return -EFAULT;
+
+			if (left_on_page > pkt_left) {
+				sg_set_page(sg, plain_pages[page_i], pkt_left,
+					    page_ofs);
+				pkt_i++;
+				seg_ofs += pkt_left;
+				page_ofs += pkt_left;
+				sg_mark_end(sg);
+				sg++;
+				sg_pkts[pkt_i] = sg;
+				pkt_left = pkt_size;
+				continue;
+			}
+			sg_set_page(sg, plain_pages[page_i], left_on_page,
+				    page_ofs);
+			page_i++;
+			page_ofs = 0;
+			seg_ofs += left_on_page;
+			pkt_left -= left_on_page;
+			if (pkt_left == 0 ||
+			    (seg_ofs == seg_len &&
+			     i == msg->msg_iter.nr_segs - 1)) {
+				sg_mark_end(sg);
+				pkt_i++;
+				sg++;
+				sg_pkts[pkt_i] = sg;
+				pkt_left = pkt_size;
+			} else {
+				sg++;
+			}
+		}
+	}
+
+	if (pkt_left && pkt_left != pkt_size) {
+		pkt_i++;
+		sg_mark_end(sg);
+	}
+	ret = pkt_i;
+
+out:
+	return ret;
+}
+
+/* sg_alloc: allocated zeroed array of scatterlists
+ * cipher_page: preallocated compound page
+ */
+static int quic_sg_cipher_from_pkts(const size_t cipher_tag_size,
+				     const size_t plain_pkt_size,
+				     const size_t plain_size,
+				     struct page * const cipher_page,
+				     struct scatterlist * const sg_alloc,
+				     const size_t nr_sg_alloc,
+				     struct scatterlist ** const sg_cipher)
+{
+	const size_t cipher_pkt_size = plain_pkt_size + cipher_tag_size;
+	size_t pkts = DIV_ROUND_UP(plain_size, plain_pkt_size);
+	struct scatterlist *sg = sg_alloc;
+	int pkt_i;
+	void *ptr;
+
+	if (pkts > nr_sg_alloc)
+		return -EINVAL;
+
+	ptr = page_address(cipher_page);
+	for (pkt_i = 0; pkt_i < pkts;
+		++pkt_i, ptr += cipher_pkt_size, ++sg) {
+		sg_set_buf(sg, ptr, cipher_pkt_size);
+		sg_mark_end(sg);
+		sg_cipher[pkt_i] = sg;
+	}
+	return pkts;
+}
+
+/* fast copy from scatterlist to a buffer assuming that all pages are
+ * available in kernel memory.
+ */
+static int quic_sg_pcopy_to_buffer_kernel(struct scatterlist *sg,
+					  u8 *buffer,
+					  size_t bytes_to_copy,
+					  off_t offset_to_read)
+{
+	off_t sg_remain = sg->length;
+	size_t to_copy;
+
+	if (!bytes_to_copy)
+		return 0;
+
+	// skip to offset first
+	while (offset_to_read > 0) {
+		if (!sg_remain)
+			return -EINVAL;
+		if (offset_to_read < sg_remain) {
+			sg_remain -= offset_to_read;
+			break;
+		}
+		offset_to_read -= sg_remain;
+		sg = sg_next(sg);
+		if (!sg)
+			return -EINVAL;
+		sg_remain = sg->length;
+	}
+
+	// traverse sg list from offset to offset + bytes_to_copy
+	while (bytes_to_copy) {
+		to_copy = min_t(size_t, bytes_to_copy, sg_remain);
+		if (!to_copy)
+			return -EINVAL;
+		memcpy(buffer, sg_virt(sg) + (sg->length - sg_remain), to_copy);
+		buffer += to_copy;
+		bytes_to_copy -= to_copy;
+		if (bytes_to_copy) {
+			sg = sg_next(sg);
+			if (!sg)
+				return -EINVAL;
+			sg_remain = sg->length;
+		}
+	}
+
+	return 0;
+}
+
+static int quic_copy_header(struct scatterlist *sg_plain,
+			    u8 *buf, const size_t buf_len,
+			    const size_t conn_id_len)
+{
+	u8 *pkt = sg_virt(sg_plain);
+	size_t hdr_len;
+
+	hdr_len = 1 + conn_id_len + ((*pkt & 0x03) + 1);
+	if (hdr_len > QUIC_MAX_SHORT_HEADER_SIZE || hdr_len > buf_len)
+		return -EINVAL;
+
+	WARN_ON_ONCE(quic_sg_pcopy_to_buffer_kernel(sg_plain, buf, hdr_len, 0));
+	return hdr_len;
+}
+
+static u64 quic_unpack_pkt_num(struct quic_tx_ancillary_data * const control,
+			       const u8 * const hdr,
+			       const off_t payload_crypto_off)
+{
+	u64 truncated_pn = 0;
+	u64 candidate_pn;
+	u64 expected_pn;
+	u64 pn_hwin;
+	u64 pn_mask;
+	u64 pn_len;
+	u64 pn_win;
+	int i;
+
+	pn_len = (hdr[0] & 0x03) + 1;
+	expected_pn = control->next_pkt_num;
+
+	for (i = 1 + control->conn_id_length; i < payload_crypto_off; ++i) {
+		truncated_pn <<= 8;
+		truncated_pn |= hdr[i];
+	}
+
+	pn_win = 1ULL << (pn_len << 3);
+	pn_hwin = pn_win >> 1;
+	pn_mask = pn_win - 1;
+	candidate_pn = (expected_pn & ~pn_mask) | truncated_pn;
+
+	if (expected_pn > pn_hwin &&
+	    candidate_pn <= expected_pn - pn_hwin &&
+	    candidate_pn < (1ULL << 62) - pn_win)
+		return candidate_pn + pn_win;
+
+	if (candidate_pn > expected_pn + pn_hwin &&
+	    candidate_pn >= pn_win)
+		return candidate_pn - pn_win;
+
+	return candidate_pn;
+}
+
+static int
+quic_construct_header_prot_mask(struct quic_internal_crypto_context *crypto_ctx,
+				struct skcipher_request *hdr_mask_req,
+				struct scatterlist *sg_cipher_pkt,
+				off_t sample_offset,
+				u8 *hdr_mask)
+{
+	u8 *sample = sg_virt(sg_cipher_pkt) + sample_offset;
+	u8 hdr_ctr[sizeof(u32) + QUIC_CIPHER_MAX_IV_SIZE];
+	struct scatterlist sg_cipher_sample;
+	struct scatterlist sg_hdr_mask;
+	struct crypto_wait wait_header;
+	u32	counter;
+
+	BUILD_BUG_ON(QUIC_HDR_MASK_SIZE
+		     < sizeof(u32) + QUIC_CIPHER_MAX_IV_SIZE);
+
+	// cipher pages are continuous, get the pointer to the sg data directly,
+	// page is allocated in kernel
+	sg_init_one(&sg_cipher_sample, sample, QUIC_HDR_MASK_SIZE);
+	sg_init_one(&sg_hdr_mask, hdr_mask, QUIC_HDR_MASK_SIZE);
+	skcipher_request_set_callback(hdr_mask_req, 0, crypto_req_done,
+				      &wait_header);
+
+	if (crypto_ctx->conn_info.cipher_type == TLS_CIPHER_CHACHA20_POLY1305) {
+		counter = cpu_to_le32(*((u32 *)sample));
+		memset(hdr_ctr, 0, sizeof(hdr_ctr));
+		memcpy((u8 *)hdr_ctr, (u8 *)&counter, sizeof(u32));
+		memcpy((u8 *)hdr_ctr + sizeof(u32),
+		       (sample + sizeof(u32)),
+		       QUIC_CIPHER_MAX_IV_SIZE);
+		skcipher_request_set_crypt(hdr_mask_req, &sg_cipher_sample,
+					   &sg_hdr_mask, 5, hdr_ctr);
+	} else {
+		skcipher_request_set_crypt(hdr_mask_req, &sg_cipher_sample,
+					   &sg_hdr_mask, QUIC_HDR_MASK_SIZE,
+					   NULL);
+	}
+
+	return crypto_wait_req(crypto_skcipher_encrypt(hdr_mask_req),
+			       &wait_header);
+}
+
+static int quic_protect_header(struct quic_internal_crypto_context *crypto_ctx,
+			       struct quic_tx_ancillary_data *control,
+			       struct skcipher_request *hdr_mask_req,
+			       struct scatterlist *sg_cipher_pkt,
+			       int payload_crypto_off)
+{
+	u8 hdr_mask[QUIC_HDR_MASK_SIZE];
+	off_t quic_pkt_num_off;
+	u8 quic_pkt_num_len;
+	u8 *cipher_hdr;
+	int err;
+	int i;
+
+	quic_pkt_num_off = 1 + control->conn_id_length;
+	quic_pkt_num_len = payload_crypto_off - quic_pkt_num_off;
+
+	if (quic_pkt_num_len > 4)
+		return -EPERM;
+
+	err = quic_construct_header_prot_mask(crypto_ctx, hdr_mask_req,
+					      sg_cipher_pkt,
+					      payload_crypto_off +
+					      (4 - quic_pkt_num_len),
+					      hdr_mask);
+	if (unlikely(err))
+		return err;
+
+	cipher_hdr = sg_virt(sg_cipher_pkt);
+	// protect the public flags
+	cipher_hdr[0] ^= (hdr_mask[0] & 0x1f);
+
+	for (i = 0; i < quic_pkt_num_len; ++i)
+		cipher_hdr[quic_pkt_num_off + i] ^= hdr_mask[1 + i];
+
+	return 0;
+}
+
+static
+void quic_construct_ietf_nonce(u8 *nonce,
+			       struct quic_internal_crypto_context *crypto_ctx,
+			       u64 quic_pkt_num)
+{
+	u8 *iv = quic_payload_iv(crypto_ctx);
+	int i;
+
+	for (i = quic_crypto_nonce_size(crypto_ctx->conn_info.cipher_type) - 1;
+	     i >= 0 && quic_pkt_num;
+	     --i, quic_pkt_num >>= 8)
+		nonce[i] = iv[i] ^ (u8)quic_pkt_num;
+
+	for (; i >= 0; --i)
+		nonce[i] = iv[i];
+}
+
+ssize_t quic_sendpage(struct quic_context *ctx,
+		      struct sock *sk,
+		      struct msghdr *msg,
+		      const size_t cipher_size,
+		      struct page * const cipher_page)
+{
+	struct kvec iov;
+	ssize_t ret;
+
+	iov.iov_base = page_address(cipher_page);
+	iov.iov_len = cipher_size;
+	iov_iter_kvec(&msg->msg_iter, WRITE | ITER_KVEC, &iov, 1,
+		      cipher_size);
+	ret = security_socket_sendmsg(sk->sk_socket, msg, msg_data_left(msg));
+	if (ret)
+		return ret;
+
+	ret = ctx->sk_proto->sendmsg(sk, msg, msg_data_left(msg));
+	WARN_ON(ret == -EIOCBQUEUED);
+	return ret;
+}
+
+static int quic_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
+{
+	struct quic_internal_crypto_context *crypto_ctx = NULL;
+	struct scatterlist *sg_cipher_pkts[QUIC_MAX_GSO_FRAGS];
+	struct scatterlist *sg_plain_pkts[QUIC_MAX_GSO_FRAGS];
+	struct page *plain_pages[QUIC_MAX_PLAIN_PAGES];
+	void *plain_base_ptrs[QUIC_MAX_IOVEC_SEGMENTS];
+	void *plain_data_ptrs[QUIC_MAX_IOVEC_SEGMENTS];
+	struct msghdr msg_cipher = {
+		.msg_name = msg->msg_name,
+		.msg_namelen = msg->msg_namelen,
+		.msg_flags = msg->msg_flags,
+		.msg_control = msg->msg_control,
+		.msg_controllen = msg->msg_controllen,
+	};
+	struct quic_connection_rhash *connhash = NULL;
+	struct quic_connection_info *conn_info = NULL;
+	struct quic_context *ctx = quic_get_ctx(sk);
+	u8 hdr_buf[QUIC_MAX_SHORT_HEADER_SIZE];
+	struct skcipher_request *hdr_mask_req;
+	struct quic_tx_ancillary_data control;
+	u8 nonce[QUIC_CIPHER_MAX_NONCE_SIZE];
+	struct	aead_request *aead_req = 0;
+	struct scatterlist *sg_cipher = 0;
+	struct udp_sock *up = udp_sk(sk);
+	struct scatterlist *sg_plain = 0;
+	u16 gso_pkt_size = up->gso_size;
+	size_t last_plain_pkt_size = 0;
+	off_t	payload_crypto_offset;
+	struct crypto_aead *tfm = 0;
+	size_t nr_plain_pages = 0;
+	struct crypto_wait waiter;
+	size_t nr_sg_cipher_pkts;
+	size_t nr_sg_plain_pkts;
+	ssize_t hdr_buf_len = 0;
+	size_t nr_sg_alloc = 0;
+	size_t plain_pkt_size;
+	u64	full_pkt_num;
+	size_t cipher_size;
+	size_t plain_size;
+	size_t pkt_size;
+	size_t tag_size;
+	int ret = 0;
+	int pkt_i;
+	int err;
+
+	memset(&hdr_buf[0], 0, QUIC_MAX_SHORT_HEADER_SIZE);
+	hdr_buf_len = copy_from_iter(hdr_buf, QUIC_MAX_SHORT_HEADER_SIZE,
+				     &msg->msg_iter);
+	if (hdr_buf_len <= 0) {
+		ret = -EINVAL;
+		goto out;
+	}
+	iov_iter_revert(&msg->msg_iter, hdr_buf_len);
+
+	ctx = quic_get_ctx(sk);
+
+	// Bypass for anything that is guaranteed not QUIC.
+	plain_size = len;
+
+	if (plain_size < 2)
+		return ctx->sk_proto->sendmsg(sk, msg, len);
+
+	// Bypass for other than short header.
+	if ((hdr_buf[0] & 0xc0) != 0x40)
+		return ctx->sk_proto->sendmsg(sk, msg, len);
+
+	// Crypto adds a tag after the packet. Corking a payload would produce
+	// a crypto tag after each portion. Use GSO instead.
+	if ((msg->msg_flags & MSG_MORE) || up->pending) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = quic_sendmsg_validate(msg);
+	if (ret)
+		goto out;
+
+	ret = quic_extract_ancillary_data(msg, &control, &gso_pkt_size);
+	if (ret)
+		goto out;
+
+	// Reserved bits with ancillary data present are an error.
+	if (control.flags & ~QUIC_ANCILLARY_FLAGS) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	// Bypass offload on request. First packet bypass applies to all
+	// packets in the GSO pack.
+	if (control.flags & QUIC_BYPASS_ENCRYPTION)
+		return ctx->sk_proto->sendmsg(sk, msg, len);
+
+	if (hdr_buf_len < 1 + control.conn_id_length) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	// Fetch the flow
+	connhash = quic_lookup_connection(ctx, &hdr_buf[1], &control);
+	if (!connhash) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	crypto_ctx = &connhash->crypto_ctx;
+	conn_info = &crypto_ctx->conn_info;
+
+	tag_size = quic_crypto_tag_size(crypto_ctx->conn_info.cipher_type);
+
+	// For GSO, use the GSO size minus cipher tag size as the packet size;
+	// for non-GSO, use the size of the whole plaintext.
+	// Reduce the packet size by tag size to keep the original packet size
+	// for the rest of the UDP path in the stack.
+	if (!gso_pkt_size) {
+		plain_pkt_size = plain_size;
+	} else {
+		if (gso_pkt_size < tag_size)
+			goto out;
+
+		plain_pkt_size = gso_pkt_size - tag_size;
+	}
+
+	// Build scatterlist from the input data, split by GSO minus the
+	// crypto tag size.
+	nr_sg_alloc = quic_sg_capacity_from_msg(plain_pkt_size,
+						msg->msg_iter.iov_offset,
+						plain_size);
+	if ((nr_sg_alloc * 2) >= QUIC_MAX_SG_ALLOC_ELEMENTS) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	sg_plain = ctx->sg_alloc;
+	sg_cipher = sg_plain + nr_sg_alloc;
+
+	ret = quic_sg_plain_from_mapped_msg(msg, plain_pages,
+					    plain_base_ptrs,
+					    plain_data_ptrs, plain_size,
+					    plain_pkt_size, sg_plain,
+					    nr_sg_alloc, sg_plain_pkts,
+					    &nr_plain_pages);
+
+	if (ret < 0)
+		goto out;
+
+	nr_sg_plain_pkts = ret;
+	last_plain_pkt_size = plain_size % plain_pkt_size;
+	if (!last_plain_pkt_size)
+		last_plain_pkt_size = plain_pkt_size;
+
+	// Build scatterlist for the ciphertext, split by GSO.
+	cipher_size = plain_size + nr_sg_plain_pkts * tag_size;
+
+	if (DIV_ROUND_UP(cipher_size, PAGE_SIZE)
+	    >= (1 << QUIC_MAX_CIPHER_PAGES_ORDER)) {
+		ret = -ENOMEM;
+		goto out_put_pages;
+	}
+
+	ret = quic_sg_cipher_from_pkts(tag_size, plain_pkt_size, plain_size,
+				       ctx->cipher_page, sg_cipher, nr_sg_alloc,
+				       sg_cipher_pkts);
+	if (ret < 0)
+		goto out_put_pages;
+
+	nr_sg_cipher_pkts = ret;
+
+	if (nr_sg_plain_pkts != nr_sg_cipher_pkts) {
+		ret = -EPERM;
+		goto out_put_pages;
+	}
+
+	// Encrypt and protect header for each packet individually.
+	tfm = crypto_ctx->packet_aead;
+	crypto_aead_clear_flags(tfm, ~0);
+	aead_req = aead_request_alloc(tfm, GFP_KERNEL);
+	if (!aead_req) {
+		aead_request_free(aead_req);
+		ret = -ENOMEM;
+		goto out_put_pages;
+	}
+
+	hdr_mask_req = skcipher_request_alloc(crypto_ctx->header_tfm,
+					      GFP_KERNEL);
+	if (!hdr_mask_req) {
+		aead_request_free(aead_req);
+		ret = -ENOMEM;
+		goto out_put_pages;
+	}
+
+	for (pkt_i = 0; pkt_i < nr_sg_plain_pkts; ++pkt_i) {
+		payload_crypto_offset =
+			quic_copy_header(sg_plain_pkts[pkt_i],
+					 hdr_buf,
+					 sizeof(hdr_buf),
+					 control.conn_id_length);
+
+		full_pkt_num = quic_unpack_pkt_num(&control, hdr_buf,
+						   payload_crypto_offset);
+
+		pkt_size = (pkt_i + 1 < nr_sg_plain_pkts
+				? plain_pkt_size
+				: last_plain_pkt_size)
+			    - payload_crypto_offset;
+		if (pkt_size < 0) {
+			aead_request_free(aead_req);
+			skcipher_request_free(hdr_mask_req);
+			ret = -EINVAL;
+			goto out_put_pages;
+		}
+
+		/* Construct nonce and initialize request */
+		quic_construct_ietf_nonce(nonce, crypto_ctx, full_pkt_num);
+
+		/* Encrypt the body */
+		aead_request_set_callback(aead_req,
+					  CRYPTO_TFM_REQ_MAY_BACKLOG
+					  | CRYPTO_TFM_REQ_MAY_SLEEP,
+					  crypto_req_done, &waiter);
+		aead_request_set_crypt(aead_req, sg_plain_pkts[pkt_i],
+				       sg_cipher_pkts[pkt_i],
+				       pkt_size,
+				       nonce);
+		aead_request_set_ad(aead_req, payload_crypto_offset);
+		err = crypto_wait_req(crypto_aead_encrypt(aead_req), &waiter);
+		if (unlikely(err)) {
+			ret = err;
+			aead_request_free(aead_req);
+			skcipher_request_free(hdr_mask_req);
+			goto out_put_pages;
+		}
+
+		/* Protect the header */
+		memcpy(sg_virt(sg_cipher_pkts[pkt_i]), hdr_buf,
+		       payload_crypto_offset);
+
+		err = quic_protect_header(crypto_ctx, &control,
+					  hdr_mask_req,
+					  sg_cipher_pkts[pkt_i],
+					  payload_crypto_offset);
+		if (unlikely(err)) {
+			ret = err;
+			aead_request_free(aead_req);
+			skcipher_request_free(hdr_mask_req);
+			goto out_put_pages;
+		}
+	}
+	skcipher_request_free(hdr_mask_req);
+	aead_request_free(aead_req);
+
+	// Deliver to the next layer.
+	if (ctx->sk_proto->sendpage) {
+		msg_cipher.msg_flags |= MSG_MORE;
+		err = ctx->sk_proto->sendmsg(sk, &msg_cipher, 0);
+		if (err < 0) {
+			ret = err;
+			goto out_put_pages;
+		}
+
+		err = ctx->sk_proto->sendpage(sk, ctx->cipher_page, 0,
+					      cipher_size, 0);
+		if (err < 0) {
+			ret = err;
+			goto out_put_pages;
+		}
+		if (err != cipher_size) {
+			ret = -EINVAL;
+			goto out_put_pages;
+		}
+		ret = plain_size;
+	} else {
+		ret = quic_sendpage(ctx, sk, &msg_cipher, cipher_size,
+				    ctx->cipher_page);
+		// indicate full plaintext transmission to the caller.
+		if (ret > 0)
+			ret = plain_size;
+	}
+
+
+out_put_pages:
+	quic_put_plain_user_pages(plain_pages, nr_plain_pages);
+
+out:
+	return ret;
+}
+
+static int quic_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t len)
+{
+	struct quic_context *ctx;
+	int ret;
+
+	rcu_read_lock();
+	ctx = quic_get_ctx(sk);
+	rcu_read_unlock();
+	if (!ctx)
+		return -EINVAL;
+
+	mutex_lock(&ctx->sendmsg_mux);
+	ret = quic_sendmsg(sk, msg, len);
+	mutex_unlock(&ctx->sendmsg_mux);
+	return ret;
+}
+
+static void quic_release_resources(struct sock *sk)
+{
+	struct quic_internal_crypto_context *crypto_ctx;
+	struct quic_connection_rhash *connhash;
+	struct inet_sock *inet = inet_sk(sk);
+	struct rhashtable_iter hti;
+	struct quic_context *ctx;
+	struct proto *sk_proto;
+
+	rcu_read_lock();
+	ctx = quic_get_ctx(sk);
+	if (!ctx) {
+		rcu_read_unlock();
+		return;
+	}
+
+	sk_proto = ctx->sk_proto;
+
+	rhashtable_walk_enter(&ctx->tx_connections, &hti);
+	rhashtable_walk_start(&hti);
+
+	while ((connhash = rhashtable_walk_next(&hti))) {
+		if (IS_ERR(connhash)) {
+			if (PTR_ERR(connhash) == -EAGAIN)
+				continue;
+			break;
+		}
+
+		crypto_ctx = &connhash->crypto_ctx;
+		crypto_free_aead(crypto_ctx->packet_aead);
+		crypto_free_skcipher(crypto_ctx->header_tfm);
+		memzero_explicit(crypto_ctx, sizeof(*crypto_ctx));
+	}
+
+	rhashtable_walk_stop(&hti);
+	rhashtable_walk_exit(&hti);
+	rhashtable_destroy(&ctx->tx_connections);
+
+	if (ctx->cipher_page) {
+		quic_free_cipher_page(ctx->cipher_page);
+		ctx->cipher_page = NULL;
+	}
+
+	rcu_read_unlock();
+
+	write_lock_bh(&sk->sk_callback_lock);
+	rcu_assign_pointer(inet->ulp_data, NULL);
+	WRITE_ONCE(sk->sk_prot, sk_proto);
+	write_unlock_bh(&sk->sk_callback_lock);
+
+	kfree_rcu(ctx, rcu);
+}
+
+static void
+quic_prep_protos(unsigned int af, struct proto *proto, const struct proto *base)
+{
+	if (likely(test_bit(af, &af_init_done)))
+		return;
+
+	spin_lock(&quic_proto_lock);
+	if (test_bit(af, &af_init_done))
+		goto out_unlock;
+
+	*proto			= *base;
+	proto->setsockopt	= quic_setsockopt;
+	proto->getsockopt	= quic_getsockopt;
+	proto->sendmsg		= quic_sendmsg_locked;
+
+	smp_mb__before_atomic(); /* proto calls should be visible first */
+	set_bit(af, &af_init_done);
+
+out_unlock:
+	spin_unlock(&quic_proto_lock);
+}
+
+static void quic_update_proto(struct sock *sk, struct quic_context *ctx)
+{
+	struct proto *udp_proto, *quic_proto;
+	struct inet_sock *inet = inet_sk(sk);
+
+	udp_proto = READ_ONCE(sk->sk_prot);
+	ctx->sk_proto = udp_proto;
+	quic_proto = sk->sk_family == AF_INET ? &quic_v4_proto : &quic_v6_proto;
+
+	quic_prep_protos(sk->sk_family, quic_proto, udp_proto);
+
+	write_lock_bh(&sk->sk_callback_lock);
+	rcu_assign_pointer(inet->ulp_data, ctx);
+	WRITE_ONCE(sk->sk_prot, quic_proto);
+	write_unlock_bh(&sk->sk_callback_lock);
+}
+
+static int quic_init(struct sock *sk)
+{
+	struct quic_context *ctx;
+
+	ctx = quic_ctx_create();
+	if (!ctx)
+		return -ENOMEM;
+
+	quic_update_proto(sk, ctx);
+
+	return 0;
+}
+
+static void quic_release(struct sock *sk)
+{
+	lock_sock(sk);
+	quic_release_resources(sk);
+	release_sock(sk);
+}
+
+static struct udp_ulp_ops quic_ulp_ops __read_mostly = {
+	.name		= "quic-crypto",
+	.owner		= THIS_MODULE,
+	.init		= quic_init,
+	.release	= quic_release,
+};
+
+static int __init quic_register(void)
+{
+	udp_register_ulp(&quic_ulp_ops);
+	return 0;
+}
+
+static void __exit quic_unregister(void)
+{
+	udp_unregister_ulp(&quic_ulp_ops);
+}
+
+module_init(quic_register);
+module_exit(quic_unregister);
+
+MODULE_DESCRIPTION("QUIC crypto ULP");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_UDP_ULP("quic-crypto");
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC net-next 5/6] net: Add flow counters and Tx processing error counter
  2022-08-03 16:40 ` Adel Abouchaev
                     ` (3 preceding siblings ...)
  2022-08-03 16:40   ` [RFC net-next 4/6] net: Implement QUIC offload functions Adel Abouchaev
@ 2022-08-03 16:40   ` Adel Abouchaev
  2022-08-03 16:40   ` [RFC net-next 6/6] net: Add self tests for ULP operations, flow setup and crypto tests Adel Abouchaev
  5 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-03 16:40 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

Added flow counters. Total flow counter is accumulative, the current shows the
number of flows currently in flight, the error counters is accumulating the
number of errors during Tx processing.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>
---
 include/net/netns/mib.h   |  3 +++
 include/net/quic.h        | 10 +++++++++
 include/net/snmp.h        |  6 +++++
 include/uapi/linux/snmp.h | 11 ++++++++++
 net/quic/Makefile         |  2 +-
 net/quic/quic_main.c      | 46 +++++++++++++++++++++++++++++++++++++++
 net/quic/quic_proc.c      | 45 ++++++++++++++++++++++++++++++++++++++
 7 files changed, 122 insertions(+), 1 deletion(-)
 create mode 100644 net/quic/quic_proc.c

diff --git a/include/net/netns/mib.h b/include/net/netns/mib.h
index 7e373664b1e7..dcbba3d1ceec 100644
--- a/include/net/netns/mib.h
+++ b/include/net/netns/mib.h
@@ -24,6 +24,9 @@ struct netns_mib {
 #if IS_ENABLED(CONFIG_TLS)
 	DEFINE_SNMP_STAT(struct linux_tls_mib, tls_statistics);
 #endif
+#if IS_ENABLED(CONFIG_QUIC)
+	DEFINE_SNMP_STAT(struct linux_quic_mib, quic_statistics);
+#endif
 #ifdef CONFIG_MPTCP
 	DEFINE_SNMP_STAT(struct mptcp_mib, mptcp_statistics);
 #endif
diff --git a/include/net/quic.h b/include/net/quic.h
index 15e04ea08c53..b6327f3b7632 100644
--- a/include/net/quic.h
+++ b/include/net/quic.h
@@ -25,6 +25,16 @@
 #define QUIC_MAX_PLAIN_PAGES		16
 #define QUIC_MAX_CIPHER_PAGES_ORDER	4
 
+#define __QUIC_INC_STATS(net, field)				\
+	__SNMP_INC_STATS((net)->mib.quic_statistics, field)
+#define QUIC_INC_STATS(net, field)				\
+	SNMP_INC_STATS((net)->mib.quic_statistics, field)
+#define QUIC_DEC_STATS(net, field)				\
+	SNMP_DEC_STATS((net)->mib.quic_statistics, field)
+
+int __net_init quic_proc_init(struct net *net);
+void __net_exit quic_proc_fini(struct net *net);
+
 struct quic_internal_crypto_context {
 	struct quic_connection_info	conn_info;
 	struct crypto_skcipher		*header_tfm;
diff --git a/include/net/snmp.h b/include/net/snmp.h
index 468a67836e2f..f94680a3e9e8 100644
--- a/include/net/snmp.h
+++ b/include/net/snmp.h
@@ -117,6 +117,12 @@ struct linux_tls_mib {
 	unsigned long	mibs[LINUX_MIB_TLSMAX];
 };
 
+/* Linux QUIC */
+#define LINUX_MIB_QUICMAX	__LINUX_MIB_QUICMAX
+struct linux_quic_mib {
+	unsigned long	mibs[LINUX_MIB_QUICMAX];
+};
+
 #define DEFINE_SNMP_STAT(type, name)	\
 	__typeof__(type) __percpu *name
 #define DEFINE_SNMP_STAT_ATOMIC(type, name)	\
diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index 904909d020e2..708f62e28c9d 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -347,4 +347,15 @@ enum
 	__LINUX_MIB_TLSMAX
 };
 
+/* linux QUIC mib definitions */
+enum
+{
+	LINUX_MIB_QUICNUM = 0,
+	LINUX_MIB_QUICCURRTXSW,			/* QuicCurrTxSw */
+	LINUX_MIB_QUICTXSW,			/* QuicTxSw */
+	LINUX_MIB_QUICTXSWERROR,		/* QuicTxSwError */
+	__LINUX_MIB_QUICMAX
+};
+
+
 #endif	/* _LINUX_SNMP_H */
diff --git a/net/quic/Makefile b/net/quic/Makefile
index 928239c4d08c..a885cd8bc4e0 100644
--- a/net/quic/Makefile
+++ b/net/quic/Makefile
@@ -5,4 +5,4 @@
 
 obj-$(CONFIG_QUIC) += quic.o
 
-quic-y := quic_main.o
+quic-y := quic_main.o quic_proc.o
diff --git a/net/quic/quic_main.c b/net/quic/quic_main.c
index e738c8130a4f..eb0fdeabd3c4 100644
--- a/net/quic/quic_main.c
+++ b/net/quic/quic_main.c
@@ -362,6 +362,8 @@ static int do_quic_conn_add_tx(struct sock *sk, sockptr_t optval,
 	if (rc < 0)
 		goto err_free_ciphers;
 
+	QUIC_INC_STATS(sock_net(sk), LINUX_MIB_QUICCURRTXSW);
+	QUIC_INC_STATS(sock_net(sk), LINUX_MIB_QUICTXSW);
 	return 0;
 
 err_free_ciphers:
@@ -411,6 +413,7 @@ static int do_quic_conn_del_tx(struct sock *sk, sockptr_t optval,
 	crypto_free_aead(crypto_ctx->packet_aead);
 	memzero_explicit(crypto_ctx, sizeof(*crypto_ctx));
 	kfree(connhash);
+	QUIC_DEC_STATS(sock_net(sk), LINUX_MIB_QUICCURRTXSW);
 
 	return 0;
 }
@@ -436,6 +439,9 @@ static int do_quic_setsockopt(struct sock *sk, int optname, sockptr_t optval,
 		break;
 	}
 
+	if (rc)
+		QUIC_INC_STATS(sock_net(sk), LINUX_MIB_QUICTXSWERROR);
+
 	return rc;
 }
 
@@ -1242,6 +1248,9 @@ static int quic_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	quic_put_plain_user_pages(plain_pages, nr_plain_pages);
 
 out:
+	if (unlikely(ret < 0))
+		QUIC_INC_STATS(sock_net(sk), LINUX_MIB_QUICTXSWERROR);
+
 	return ret;
 }
 
@@ -1374,6 +1383,36 @@ static void quic_release(struct sock *sk)
 	release_sock(sk);
 }
 
+static int __net_init quic_init_net(struct net *net)
+{
+	int err;
+
+	net->mib.quic_statistics = alloc_percpu(struct linux_quic_mib);
+	if (!net->mib.quic_statistics)
+		return -ENOMEM;
+
+	err = quic_proc_init(net);
+	if (err)
+		goto err_free_stats;
+
+	return 0;
+
+err_free_stats:
+	free_percpu(net->mib.quic_statistics);
+	return err;
+}
+
+static void __net_exit quic_exit_net(struct net *net)
+{
+	quic_proc_fini(net);
+	free_percpu(net->mib.quic_statistics);
+}
+
+static struct pernet_operations quic_proc_ops = {
+	.init = quic_init_net,
+	.exit = quic_exit_net,
+};
+
 static struct udp_ulp_ops quic_ulp_ops __read_mostly = {
 	.name		= "quic-crypto",
 	.owner		= THIS_MODULE,
@@ -1383,6 +1422,12 @@ static struct udp_ulp_ops quic_ulp_ops __read_mostly = {
 
 static int __init quic_register(void)
 {
+	int err;
+
+	err = register_pernet_subsys(&quic_proc_ops);
+	if (err)
+		return err;
+
 	udp_register_ulp(&quic_ulp_ops);
 	return 0;
 }
@@ -1390,6 +1435,7 @@ static int __init quic_register(void)
 static void __exit quic_unregister(void)
 {
 	udp_unregister_ulp(&quic_ulp_ops);
+	unregister_pernet_subsys(&quic_proc_ops);
 }
 
 module_init(quic_register);
diff --git a/net/quic/quic_proc.c b/net/quic/quic_proc.c
new file mode 100644
index 000000000000..cb4fe7a589b5
--- /dev/null
+++ b/net/quic/quic_proc.c
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+/* Copyright (C) 2019 Meta Platforms, Inc. */
+
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <net/snmp.h>
+#include <net/quic.h>
+
+#ifdef CONFIG_PROC_FS
+static const struct snmp_mib quic_mib_list[] = {
+	SNMP_MIB_ITEM("QuicCurrTxSw", LINUX_MIB_QUICCURRTXSW),
+	SNMP_MIB_ITEM("QuicTxSw", LINUX_MIB_QUICTXSW),
+	SNMP_MIB_ITEM("QuicTxSwError", LINUX_MIB_QUICTXSWERROR),
+	SNMP_MIB_SENTINEL
+};
+
+static int quic_statistics_seq_show(struct seq_file *seq, void *v)
+{
+	unsigned long buf[LINUX_MIB_QUICMAX] = {};
+	struct net *net = seq->private;
+	int i;
+
+	snmp_get_cpu_field_batch(buf, quic_mib_list, net->mib.quic_statistics);
+	for (i = 0; quic_mib_list[i].name; i++)
+		seq_printf(seq, "%-32s\t%lu\n", quic_mib_list[i].name, buf[i]);
+
+	return 0;
+}
+#endif
+
+int __net_init quic_proc_init(struct net *net)
+{
+#ifdef CONFIG_PROC_FS
+	if (!proc_create_net_single("quic_stat", 0444, net->proc_net,
+				    quic_statistics_seq_show, NULL))
+		return -ENOMEM;
+#endif /* CONFIG_PROC_FS */
+
+	return 0;
+}
+
+void __net_exit quic_proc_fini(struct net *net)
+{
+	remove_proc_entry("quic_stat", net->proc_net);
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC net-next 6/6] net: Add self tests for ULP operations, flow setup and crypto tests
  2022-08-03 16:40 ` Adel Abouchaev
                     ` (4 preceding siblings ...)
  2022-08-03 16:40   ` [RFC net-next 5/6] net: Add flow counters and Tx processing error counter Adel Abouchaev
@ 2022-08-03 16:40   ` Adel Abouchaev
  5 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-03 16:40 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

Add self tests for ULP operations, flow setup and crypto tests.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>
---
 tools/testing/selftests/net/.gitignore |    1 +
 tools/testing/selftests/net/Makefile   |    2 +-
 tools/testing/selftests/net/quic.c     | 1024 ++++++++++++++++++++++++
 tools/testing/selftests/net/quic.sh    |   45 ++
 4 files changed, 1071 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/net/quic.c
 create mode 100755 tools/testing/selftests/net/quic.sh

diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore
index ffc35a22e914..bd4967e57803 100644
--- a/tools/testing/selftests/net/.gitignore
+++ b/tools/testing/selftests/net/.gitignore
@@ -38,3 +38,4 @@ ioam6_parser
 toeplitz
 tun
 cmsg_sender
+quic
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index db05b3764b77..aee89b0458b4 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -54,7 +54,7 @@ TEST_GEN_FILES += ipsec
 TEST_GEN_FILES += ioam6_parser
 TEST_GEN_FILES += gro
 TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
-TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls tun
+TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls tun quic
 TEST_GEN_FILES += toeplitz
 TEST_GEN_FILES += cmsg_sender
 TEST_GEN_FILES += stress_reuseport_listen
diff --git a/tools/testing/selftests/net/quic.c b/tools/testing/selftests/net/quic.c
new file mode 100644
index 000000000000..20e425003fcb
--- /dev/null
+++ b/tools/testing/selftests/net/quic.c
@@ -0,0 +1,1024 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include <arpa/inet.h>
+#include <errno.h>
+#include <error.h>
+#include <fcntl.h>
+#include <poll.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <linux/limits.h>
+#include <linux/quic.h>
+#include <linux/socket.h>
+#include <linux/tls.h>
+#include <linux/tcp.h>
+#include <linux/types.h>
+#include <linux/udp.h>
+
+#include <sys/ioctl.h>
+#include <sys/types.h>
+#include <sys/sendfile.h>
+#include <sys/socket.h>
+#include <sys/stat.h>
+
+#include "../kselftest_harness.h"
+
+#define UDP_ULP		105
+
+#ifndef SOL_UDP
+#define SOL_UDP		17
+#endif
+
+// 1. QUIC ULP Registration Test
+
+FIXTURE(quic_ulp)
+{
+	int sfd;
+	socklen_t len_s;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} server;
+	int default_net_ns_fd;
+	int server_net_ns_fd;
+};
+
+FIXTURE_VARIANT(quic_ulp)
+{
+	unsigned int af_server;
+	char *server_address;
+	unsigned short server_port;
+};
+
+FIXTURE_VARIANT_ADD(quic_ulp, ipv4)
+{
+	.af_server = AF_INET,
+	.server_address = "10.0.0.2",
+	.server_port = 7101,
+};
+
+FIXTURE_VARIANT_ADD(quic_ulp, ipv6)
+{
+	.af_server = AF_INET6,
+	.server_address = "2001::2",
+	.server_port = 7102,
+};
+
+FIXTURE_SETUP(quic_ulp)
+{
+	char path[PATH_MAX];
+	int optval = 1;
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/net", getpid());
+	self->default_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->default_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns2");
+	self->server_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->server_net_ns_fd, 0);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	self->sfd = socket(variant->af_server, SOCK_DGRAM, 0);
+	ASSERT_NE(setsockopt(self->sfd, SOL_SOCKET, SO_REUSEPORT, &optval,
+		   sizeof(optval)), -1);
+	if (variant->af_server == AF_INET) {
+		self->len_s = sizeof(self->server.addr);
+		self->server.addr.sin_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr.sin_addr);
+		self->server.addr.sin_port = htons(variant->server_port);
+		ASSERT_EQ(bind(self->sfd, &self->server.addr, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr,
+				      &self->len_s), 0);
+	} else {
+		self->len_s = sizeof(self->server.addr6);
+		self->server.addr6.sin6_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr6.sin6_addr);
+		self->server.addr6.sin6_port = htons(variant->server_port);
+		ASSERT_EQ(bind(self->sfd, &self->server.addr6, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr6,
+				      &self->len_s), 0);
+	}
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+};
+
+FIXTURE_TEARDOWN(quic_ulp)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	close(self->sfd);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+};
+
+TEST_F(quic_ulp, request_nonexistent_udp_ulp)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_ULP,
+			     "nonexistent", sizeof("nonexistent")), -1);
+	// If UDP_ULP option is not present, the error would be ENOPROTOOPT.
+	ASSERT_EQ(errno, ENOENT);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+};
+
+TEST_F(quic_ulp, request_quic_crypto_udp_ulp)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_ULP,
+			     "quic-crypto", sizeof("quic-crypto")), 0);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+};
+
+// 2. QUIC Data Path Operation Tests
+
+#define DO_NOT_SETUP_FLOW 0
+#define SETUP_FLOW 1
+
+#define DO_NOT_USE_CLIENT 0
+#define USE_CLIENT 1
+
+FIXTURE(quic_data)
+{
+	int sfd, c1fd, c2fd;
+	socklen_t len_c1;
+	socklen_t len_c2;
+	socklen_t len_s;
+
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} client_1;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} client_2;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} server;
+	int default_net_ns_fd;
+	int client_1_net_ns_fd;
+	int client_2_net_ns_fd;
+	int server_net_ns_fd;
+};
+
+FIXTURE_VARIANT(quic_data)
+{
+	unsigned int af_client_1;
+	char *client_1_address;
+	unsigned short client_1_port;
+	uint8_t conn_id_1[8];
+	uint8_t conn_1_key[16];
+	uint8_t conn_1_iv[12];
+	uint8_t conn_1_hdr_key[16];
+	size_t conn_id_1_len;
+	bool setup_flow_1;
+	bool use_client_1;
+	unsigned int af_client_2;
+	char *client_2_address;
+	unsigned short client_2_port;
+	uint8_t conn_id_2[8];
+	uint8_t conn_2_key[16];
+	uint8_t conn_2_iv[12];
+	uint8_t conn_2_hdr_key[16];
+	size_t conn_id_2_len;
+	bool setup_flow_2;
+	bool use_client_2;
+	unsigned int af_server;
+	char *server_address;
+	unsigned short server_port;
+};
+
+FIXTURE_VARIANT_ADD(quic_data, ipv4)
+{
+	.af_client_1 = AF_INET,
+	.client_1_address = "10.0.0.1",
+	.client_1_port = 6667,
+	.conn_id_1 = {0x11, 0x12, 0x13, 0x14},
+	.conn_id_1_len = 4,
+	.setup_flow_1 = SETUP_FLOW,
+	.use_client_1 = USE_CLIENT,
+	.af_client_2 = AF_INET,
+	.client_2_address = "10.0.0.3",
+	.client_2_port = 6668,
+	.conn_id_2 = {0x21, 0x22, 0x23, 0x24},
+	.conn_id_2_len = 4,
+	.setup_flow_2 = SETUP_FLOW,
+	//.use_client_2 = USE_CLIENT,
+	.af_server = AF_INET,
+	.server_address = "10.0.0.2",
+	.server_port = 6669,
+};
+
+FIXTURE_VARIANT_ADD(quic_data, ipv6_mapped_ipv4_two_conns)
+{
+	.af_client_1 = AF_INET6,
+	.client_1_address = "::ffff:10.0.0.1",
+	.client_1_port = 6670,
+	.conn_id_1 = {0x11, 0x12, 0x13, 0x14},
+	.conn_id_1_len = 4,
+	.setup_flow_1 = SETUP_FLOW,
+	.use_client_1 = USE_CLIENT,
+	.af_client_2 = AF_INET6,
+	.client_2_address = "::ffff:10.0.0.3",
+	.client_2_port = 6671,
+	.conn_id_2 = {0x21, 0x22, 0x23, 0x24},
+	.conn_id_2_len = 4,
+	.setup_flow_2 = SETUP_FLOW,
+	.use_client_2 = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "::ffff:10.0.0.2",
+	.server_port = 6672,
+};
+
+FIXTURE_VARIANT_ADD(quic_data, ipv6_mapped_ipv4_setup_ipv4_one_conn)
+{
+	.af_client_1 = AF_INET,
+	.client_1_address = "10.0.0.3",
+	.client_1_port = 6676,
+	.conn_id_1 = {0x11, 0x12, 0x13, 0x14},
+	.conn_id_1_len = 4,
+	.setup_flow_1 = SETUP_FLOW,
+	.use_client_1 = DO_NOT_USE_CLIENT,
+	.af_client_2 = AF_INET6,
+	.client_2_address = "::ffff:10.0.0.3",
+	.client_2_port = 6676,
+	.conn_id_2 = {0x11, 0x12, 0x13, 0x14},
+	.conn_id_2_len = 4,
+	.setup_flow_2 = DO_NOT_SETUP_FLOW,
+	.use_client_2 = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "::ffff:10.0.0.2",
+	.server_port = 6677,
+};
+
+FIXTURE_VARIANT_ADD(quic_data, ipv6_mapped_ipv4_setup_ipv6_one_conn)
+{
+	.af_client_1 = AF_INET6,
+	.client_1_address = "::ffff:10.0.0.3",
+	.client_1_port = 6678,
+	.conn_id_1 = {0x11, 0x12, 0x13, 0x14},
+	.setup_flow_1 = SETUP_FLOW,
+	.use_client_1 = DO_NOT_USE_CLIENT,
+	.af_client_2 = AF_INET,
+	.client_2_address = "10.0.0.3",
+	.client_2_port = 6678,
+	.conn_id_2 = {0x11, 0x12, 0x13, 0x14},
+	.setup_flow_2 = DO_NOT_SETUP_FLOW,
+	.use_client_2 = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "::ffff:10.0.0.2",
+	.server_port = 6679,
+};
+
+FIXTURE_SETUP(quic_data)
+{
+	char path[PATH_MAX];
+	int optval = 1;
+
+	if (variant->af_client_1 == AF_INET) {
+		self->len_c1 = sizeof(self->client_1.addr);
+		self->client_1.addr.sin_family = variant->af_client_1;
+		inet_pton(variant->af_client_1, variant->client_1_address,
+			  &self->client_1.addr.sin_addr);
+		self->client_1.addr.sin_port = htons(variant->client_1_port);
+	} else {
+		self->len_c1 = sizeof(self->client_1.addr6);
+		self->client_1.addr6.sin6_family = variant->af_client_1;
+		inet_pton(variant->af_client_1, variant->client_1_address,
+			  &self->client_1.addr6.sin6_addr);
+		self->client_1.addr6.sin6_port = htons(variant->client_1_port);
+	}
+
+	if (variant->af_client_2 == AF_INET) {
+		self->len_c2 = sizeof(self->client_2.addr);
+		self->client_2.addr.sin_family = variant->af_client_2;
+		inet_pton(variant->af_client_2, variant->client_2_address,
+			  &self->client_2.addr.sin_addr);
+		self->client_2.addr.sin_port = htons(variant->client_2_port);
+	} else {
+		self->len_c2 = sizeof(self->client_2.addr6);
+		self->client_2.addr6.sin6_family = variant->af_client_2;
+		inet_pton(variant->af_client_2, variant->client_2_address,
+			  &self->client_2.addr6.sin6_addr);
+		self->client_2.addr6.sin6_port = htons(variant->client_2_port);
+	}
+
+	if (variant->af_server == AF_INET) {
+		self->len_s = sizeof(self->server.addr);
+		self->server.addr.sin_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr.sin_addr);
+		self->server.addr.sin_port = htons(variant->server_port);
+	} else {
+		self->len_s = sizeof(self->server.addr6);
+		self->server.addr6.sin6_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr6.sin6_addr);
+		self->server.addr6.sin6_port = htons(variant->server_port);
+	}
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/net", getpid());
+	self->default_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->default_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns11");
+	self->client_1_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->client_1_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns12");
+	self->client_2_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->client_2_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns2");
+	self->server_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->server_net_ns_fd, 0);
+
+	if (variant->use_client_1) {
+		ASSERT_NE(setns(self->client_1_net_ns_fd, 0), -1);
+		self->c1fd = socket(variant->af_client_1, SOCK_DGRAM, 0);
+		ASSERT_NE(setsockopt(self->c1fd, SOL_SOCKET, SO_REUSEPORT, &optval,
+		   sizeof(optval)), -1);
+		if (variant->af_client_1 == AF_INET) {
+			ASSERT_EQ(bind(self->c1fd, &self->client_1.addr,
+				       self->len_c1), 0);
+			ASSERT_EQ(getsockname(self->c1fd, &self->client_1.addr,
+					      &self->len_c1), 0);
+		} else {
+			ASSERT_EQ(bind(self->c1fd, &self->client_1.addr6,
+				       self->len_c1), 0);
+			ASSERT_EQ(getsockname(self->c1fd, &self->client_1.addr6,
+					      &self->len_c1), 0);
+		}
+	}
+
+	if (variant->use_client_2) {
+		ASSERT_NE(setns(self->client_2_net_ns_fd, 0), -1);
+		self->c2fd = socket(variant->af_client_2, SOCK_DGRAM, 0);
+		ASSERT_NE(setsockopt(self->c2fd, SOL_SOCKET, SO_REUSEPORT, &optval,
+		   sizeof(optval)), -1);
+		if (variant->af_client_2 == AF_INET) {
+			ASSERT_EQ(bind(self->c2fd, &self->client_2.addr,
+				       self->len_c2), 0);
+			ASSERT_EQ(getsockname(self->c2fd, &self->client_2.addr,
+					      &self->len_c2), 0);
+		} else {
+			ASSERT_EQ(bind(self->c2fd, &self->client_2.addr6,
+				       self->len_c2), 0);
+			ASSERT_EQ(getsockname(self->c2fd, &self->client_2.addr6,
+					      &self->len_c2), 0);
+		}
+	}
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	self->sfd = socket(variant->af_server, SOCK_DGRAM, 0);
+	ASSERT_NE(setsockopt(self->sfd, SOL_SOCKET, SO_REUSEPORT, &optval,
+	   sizeof(optval)), -1);
+	if (variant->af_server == AF_INET) {
+		ASSERT_EQ(bind(self->sfd, &self->server.addr, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr,
+				      &self->len_s), 0);
+	} else {
+		ASSERT_EQ(bind(self->sfd, &self->server.addr6, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr6,
+				      &self->len_s), 0);
+	}
+
+	ASSERT_EQ(setsockopt(self->sfd, IPPROTO_UDP, UDP_ULP,
+			     "quic-crypto", sizeof("quic-crypto")), 0);
+
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+FIXTURE_TEARDOWN(quic_data)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	close(self->sfd);
+	ASSERT_NE(setns(self->client_1_net_ns_fd, 0), -1);
+	close(self->c1fd);
+	ASSERT_NE(setns(self->client_2_net_ns_fd, 0), -1);
+	close(self->c2fd);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+TEST_F(quic_data, send_fail_no_flow)
+{
+	char const *test_str = "test_read";
+	int send_len = 10;
+
+	ASSERT_EQ(strlen(test_str) + 1, send_len);
+	EXPECT_EQ(sendto(self->sfd, test_str, send_len, 0,
+			 &self->client_1.addr, self->len_c1), -1);
+};
+
+TEST_F(quic_data, encrypt_two_conn_gso_1200_iov_2_size_9000_aesgcm128)
+{
+	size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
+	uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)];
+	struct quic_connection_info conn_1_info;
+	struct quic_connection_info conn_2_info;
+	struct quic_tx_ancillary_data *anc_data;
+	socklen_t recv_addr_len_1;
+	socklen_t recv_addr_len_2;
+	struct cmsghdr *cmsg_hdr;
+	int frag_size = 1200;
+	int send_len = 9000;
+	struct iovec iov[2];
+	int msg_len = 4500;
+	struct msghdr msg;
+	char *test_str_1;
+	char *test_str_2;
+	char *buf_1;
+	char *buf_2;
+	int i;
+
+	test_str_1 = (char *)malloc(9000);
+	test_str_2 = (char *)malloc(9000);
+	memset(test_str_1, 0, 9000);
+	memset(test_str_2, 0, 9000);
+
+	buf_1 = (char *)malloc(10000);
+	buf_2 = (char *)malloc(10000);
+	for (i = 0; i < 9000; i += (1200 - 16)) {
+		test_str_1[i] = 0x40;
+		memcpy(&test_str_1[i + 1], &variant->conn_id_1,
+		       variant->conn_id_1_len);
+		test_str_1[i + 1 + variant->conn_id_1_len] = 0xca;
+
+		test_str_2[i] = 0x40;
+		memcpy(&test_str_2[i + 1], &variant->conn_id_2,
+		       variant->conn_id_2_len);
+		test_str_2[i + 1 + variant->conn_id_2_len] = 0xca;
+	}
+
+	// program the connection into the offload
+	conn_1_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	memset(&conn_1_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_1_info.key.conn_id_length = variant->conn_id_1_len;
+	memcpy(conn_1_info.key.conn_id,
+	       &variant->conn_id_1,
+	       variant->conn_id_1_len);
+
+	conn_2_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	memset(&conn_2_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_2_info.key.conn_id_length = variant->conn_id_2_len;
+	memcpy(conn_2_info.key.conn_id,
+	       &variant->conn_id_2,
+	       variant->conn_id_2_len);
+
+	memcpy(&conn_1_info.crypto_info_aes_gcm_128.packet_encryption_key,
+	       &variant->conn_1_key, 16);
+	memcpy(&conn_1_info.crypto_info_aes_gcm_128.packet_encryption_iv,
+	       &variant->conn_1_iv, 12);
+	memcpy(&conn_1_info.crypto_info_aes_gcm_128.header_encryption_key,
+	       &variant->conn_1_hdr_key, 16);
+	memcpy(&conn_2_info.crypto_info_aes_gcm_128.packet_encryption_key,
+	       &variant->conn_2_key, 16);
+	memcpy(&conn_2_info.crypto_info_aes_gcm_128.packet_encryption_iv,
+	       &variant->conn_2_iv, 12);
+	memcpy(&conn_2_info.crypto_info_aes_gcm_128.header_encryption_key,
+	       &variant->conn_2_hdr_key,
+	       16);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
+			     sizeof(frag_size)), 0);
+
+	if (variant->setup_flow_1)
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_ADD_TX_CONNECTION,
+				     &conn_1_info, sizeof(conn_1_info)), 0);
+
+	if (variant->setup_flow_2)
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_ADD_TX_CONNECTION,
+				     &conn_2_info, sizeof(conn_2_info)), 0);
+
+	recv_addr_len_1 = self->len_c1;
+	recv_addr_len_2 = self->len_c2;
+
+	iov[0].iov_base = test_str_1;
+	iov[0].iov_len = msg_len;
+	iov[1].iov_base = (void *)test_str_1 + 4500;
+	iov[1].iov_len = msg_len;
+
+	msg.msg_name = (self->client_1.addr.sin_family == AF_INET)
+		       ? (void *)&self->client_1.addr
+		       : (void *)&self->client_1.addr6;
+	msg.msg_namelen = self->len_c1;
+	msg.msg_iov = iov;
+	msg.msg_iovlen = 2;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->flags = 0;
+	anc_data->conn_id_length = variant->conn_id_1_len;
+
+	if (variant->use_client_1)
+		EXPECT_EQ(sendmsg(self->sfd, &msg, 0), send_len);
+
+	iov[0].iov_base = test_str_2;
+	iov[0].iov_len = msg_len;
+	iov[1].iov_base = (void *)test_str_2 + 4500;
+	iov[1].iov_len = msg_len;
+	msg.msg_name = (self->client_2.addr.sin_family == AF_INET)
+		       ? (void *)&self->client_2.addr
+		       : (void *)&self->client_2.addr6;
+	msg.msg_namelen = self->len_c2;
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->conn_id_length = variant->conn_id_2_len;
+	anc_data->flags = 0;
+
+	if (variant->use_client_2)
+		EXPECT_EQ(sendmsg(self->sfd, &msg, 0), send_len);
+
+	if (variant->use_client_1) {
+		ASSERT_NE(setns(self->client_1_net_ns_fd, 0), -1);
+		if (variant->af_client_1 == AF_INET) {
+			for (i = 0; i < 7; ++i) {
+				EXPECT_EQ(recvfrom(self->c1fd, buf_1, 9000, 0,
+						   &self->client_1.addr,
+						   &recv_addr_len_1),
+					  1200);
+				// Validate framing is intact.
+				EXPECT_EQ(memcmp((void *)buf_1 + 1,
+						 &variant->conn_id_1,
+						 variant->conn_id_1_len), 0);
+			}
+			EXPECT_EQ(recvfrom(self->c1fd, buf_1, 9000, 0,
+					   &self->client_1.addr,
+					   &recv_addr_len_1),
+				  728);
+			EXPECT_EQ(memcmp((void *)buf_1 + 1,
+					 &variant->conn_id_1,
+					 variant->conn_id_1_len), 0);
+		} else {
+			for (i = 0; i < 7; ++i) {
+				EXPECT_EQ(recvfrom(self->c1fd, buf_1, 9000, 0,
+						   &self->client_1.addr6,
+						   &recv_addr_len_1),
+					1200);
+			}
+			EXPECT_EQ(recvfrom(self->c1fd, buf_1, 9000, 0,
+					   &self->client_1.addr6,
+					   &recv_addr_len_1),
+				  728);
+			EXPECT_EQ(memcmp((void *)buf_1 + 1,
+					 &variant->conn_id_1,
+					 variant->conn_id_1_len), 0);
+		}
+		EXPECT_NE(memcmp(buf_1, test_str_1, send_len), 0);
+	}
+
+	if (variant->use_client_2) {
+		ASSERT_NE(setns(self->client_2_net_ns_fd, 0), -1);
+		if (variant->af_client_2 == AF_INET) {
+			for (i = 0; i < 7; ++i) {
+				EXPECT_EQ(recvfrom(self->c2fd, buf_2, 9000, 0,
+						   &self->client_2.addr,
+						   &recv_addr_len_2),
+					  1200);
+				EXPECT_EQ(memcmp((void *)buf_2 + 1,
+						 &variant->conn_id_2,
+						 variant->conn_id_2_len), 0);
+			}
+			EXPECT_EQ(recvfrom(self->c2fd, buf_2, 9000, 0,
+					   &self->client_2.addr,
+					   &recv_addr_len_2),
+				  728);
+			EXPECT_EQ(memcmp((void *)buf_2 + 1,
+					 &variant->conn_id_2,
+					 variant->conn_id_2_len), 0);
+		} else {
+			for (i = 0; i < 7; ++i) {
+				EXPECT_EQ(recvfrom(self->c2fd, buf_2, 9000, 0,
+						   &self->client_2.addr6,
+						   &recv_addr_len_2),
+					  1200);
+				EXPECT_EQ(memcmp((void *)buf_2 + 1,
+						 &variant->conn_id_2,
+						 variant->conn_id_2_len), 0);
+			}
+			EXPECT_EQ(recvfrom(self->c2fd, buf_2, 9000, 0,
+					   &self->client_2.addr6,
+					   &recv_addr_len_2),
+				  728);
+			EXPECT_EQ(memcmp((void *)buf_2 + 1,
+					 &variant->conn_id_2,
+					 variant->conn_id_2_len), 0);
+		}
+		EXPECT_NE(memcmp(buf_2, test_str_2, send_len), 0);
+	}
+
+	if (variant->use_client_1 && variant->use_client_2)
+		EXPECT_NE(memcmp(buf_1, buf_2, send_len), 0);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	if (variant->setup_flow_1) {
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_DEL_TX_CONNECTION,
+				     &conn_1_info, sizeof(conn_1_info)),
+			  0);
+	}
+	if (variant->setup_flow_2) {
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_DEL_TX_CONNECTION,
+				     &conn_2_info, sizeof(conn_2_info)),
+			  0);
+	}
+	free(test_str_1);
+	free(test_str_2);
+	free(buf_1);
+	free(buf_2);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+// 3. QUIC Encryption Tests
+
+FIXTURE(quic_crypto)
+{
+	int sfd, cfd;
+	socklen_t len_c;
+	socklen_t len_s;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} client;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} server;
+	int default_net_ns_fd;
+	int client_net_ns_fd;
+	int server_net_ns_fd;
+};
+
+FIXTURE_VARIANT(quic_crypto)
+{
+	unsigned int af_client;
+	char *client_address;
+	unsigned short client_port;
+	uint32_t algo;
+	uint8_t conn_id[8];
+	uint8_t conn_key[16];
+	uint8_t conn_iv[12];
+	uint8_t conn_hdr_key[16];
+	size_t conn_id_len;
+	bool setup_flow;
+	bool use_client;
+	unsigned int af_server;
+	char *server_address;
+	unsigned short server_port;
+};
+
+FIXTURE_SETUP(quic_crypto)
+{
+	char path[PATH_MAX];
+	int optval = 1;
+
+	if (variant->af_client == AF_INET) {
+		self->len_c = sizeof(self->client.addr);
+		self->client.addr.sin_family = variant->af_client;
+		inet_pton(variant->af_client, variant->client_address,
+			  &self->client.addr.sin_addr);
+		self->client.addr.sin_port = htons(variant->client_port);
+	} else {
+		self->len_c = sizeof(self->client.addr6);
+		self->client.addr6.sin6_family = variant->af_client;
+		inet_pton(variant->af_client, variant->client_address,
+			  &self->client.addr6.sin6_addr);
+		self->client.addr6.sin6_port = htons(variant->client_port);
+	}
+
+	if (variant->af_server == AF_INET) {
+		self->len_s = sizeof(self->server.addr);
+		self->server.addr.sin_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr.sin_addr);
+		self->server.addr.sin_port = htons(variant->server_port);
+	} else {
+		self->len_s = sizeof(self->server.addr6);
+		self->server.addr6.sin6_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr6.sin6_addr);
+		self->server.addr6.sin6_port = htons(variant->server_port);
+	}
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/net", getpid());
+	self->default_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->default_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns11");
+	self->client_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->client_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns2");
+	self->server_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->server_net_ns_fd, 0);
+
+	if (variant->use_client) {
+		ASSERT_NE(setns(self->client_net_ns_fd, 0), -1);
+		self->cfd = socket(variant->af_client, SOCK_DGRAM, 0);
+		ASSERT_NE(setsockopt(self->cfd, SOL_SOCKET, SO_REUSEPORT, &optval,
+			sizeof(optval)), -1);
+		if (variant->af_client == AF_INET) {
+			ASSERT_EQ(bind(self->cfd, &self->client.addr,
+				       self->len_c), 0);
+			ASSERT_EQ(getsockname(self->cfd, &self->client.addr,
+					      &self->len_c), 0);
+		} else {
+			ASSERT_EQ(bind(self->cfd, &self->client.addr6,
+				       self->len_c), 0);
+			ASSERT_EQ(getsockname(self->cfd, &self->client.addr6,
+					      &self->len_c), 0);
+		}
+	}
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	self->sfd = socket(variant->af_server, SOCK_DGRAM, 0);
+	ASSERT_NE(setsockopt(self->sfd, SOL_SOCKET, SO_REUSEPORT, &optval,
+	   sizeof(optval)), -1);
+	if (variant->af_server == AF_INET) {
+		ASSERT_EQ(bind(self->sfd, &self->server.addr, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr,
+				      &self->len_s),
+			  0);
+	} else {
+		ASSERT_EQ(bind(self->sfd, &self->server.addr6, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr6,
+				      &self->len_s),
+			  0);
+	}
+
+	ASSERT_EQ(setsockopt(self->sfd, IPPROTO_UDP, UDP_ULP,
+			     "quic-crypto", sizeof("quic-crypto")), 0);
+
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+FIXTURE_TEARDOWN(quic_crypto)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	close(self->sfd);
+	ASSERT_NE(setns(self->client_net_ns_fd, 0), -1);
+	close(self->cfd);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+FIXTURE_VARIANT_ADD(quic_crypto, ipv4)
+{
+	.af_client = AF_INET,
+	.client_address = "10.0.0.1",
+	.client_port = 7667,
+	.conn_id = {0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12, 0x49},
+	.conn_key = {0x87, 0x71, 0xEA, 0x1D, 0xFB, 0xBE, 0x7A, 0x45, 0xBB,
+		0xE2, 0x7E, 0xBC, 0x0B, 0x53, 0x94, 0x99},
+	.conn_iv = {0x3A, 0xA7, 0x46, 0x72, 0xE9, 0x83, 0x6B, 0x55, 0xDA,
+		0x66, 0x7B, 0xDA},
+	.conn_hdr_key = {0xC9, 0x8E, 0xFD, 0xF2, 0x0B, 0x64, 0x8C, 0x57,
+		0xB5, 0x0A, 0xB2, 0xD2, 0x21, 0xD3, 0x66, 0xA5},
+	.conn_id_len = 8,
+	.setup_flow = SETUP_FLOW,
+	.use_client = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "10.0.0.2",
+	.server_port = 7669,
+};
+
+FIXTURE_VARIANT_ADD(quic_crypto, ipv6)
+{
+	.af_client = AF_INET6,
+	.client_address = "2001::1",
+	.client_port = 7673,
+	.conn_id = {0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12, 0x49},
+	.conn_key = {0x87, 0x71, 0xEA, 0x1D, 0xFB, 0xBE, 0x7A, 0x45, 0xBB,
+		0xE2, 0x7E, 0xBC, 0x0B, 0x53, 0x94, 0x99},
+	.conn_iv = {0x3A, 0xA7, 0x46, 0x72, 0xE9, 0x83, 0x6B, 0x55, 0xDA,
+		0x66, 0x7B, 0xDA},
+	.conn_hdr_key = {0xC9, 0x8E, 0xFD, 0xF2, 0x0B, 0x64, 0x8C, 0x57,
+		0xB5, 0x0A, 0xB2, 0xD2, 0x21, 0xD3, 0x66, 0xA5},
+	.conn_id_len = 8,
+	.setup_flow = SETUP_FLOW,
+	.use_client = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "2001::2",
+	.server_port = 7675,
+};
+
+TEST_F(quic_crypto, encrypt_test_vector_aesgcm128_single_flow_gso_in_control)
+{
+	char test_str[37] = {// Header, conn id and pkt num
+			     0x40, 0x08, 0x6B, 0xBF, 0x88, 0x82, 0xB9, 0x12,
+			     0x49, 0xCA,
+			     // Payload
+			     0x02, 0x80, 0xDE, 0x40, 0x39, 0x40, 0xF6, 0x00,
+			     0x01, 0x0B, 0x00, 0x0F, 0x65, 0x63, 0x68, 0x6F,
+			     0x20, 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36,
+			     0x37, 0x38, 0x39
+	};
+
+	char match_str[53] = {
+			     0x46, 0x08, 0x6B, 0xBF, 0x88, 0x82, 0xB9, 0x12,
+			     0x49, 0x1C, 0x44, 0xB8, 0x41, 0xBB, 0xCF, 0x6E,
+			     0x0A, 0x2A, 0x24, 0xFB, 0xB4, 0x79, 0x62, 0xEA,
+			     0x59, 0x38, 0x1A, 0x0E, 0x50, 0x1E, 0x59, 0xED,
+			     0x3F, 0x8E, 0x7E, 0x5A, 0x70, 0xE4, 0x2A, 0xBC,
+			     0x2A, 0xFA, 0x2B, 0x54, 0xEB, 0x89, 0xC3, 0x2C,
+			     0xB6, 0x8C, 0x1E, 0xAB, 0x2D
+	};
+
+	size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
+	uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)
+			 + CMSG_SPACE(sizeof(uint16_t))];
+	struct quic_tx_ancillary_data *anc_data;
+	struct quic_connection_info conn_info;
+	int send_len = sizeof(test_str);
+	int msg_len = sizeof(test_str);
+	uint16_t frag_size = 1200;
+	struct cmsghdr *cmsg_hdr;
+	int wrong_frag_size = 26;
+	socklen_t recv_addr_len;
+	struct iovec iov[2];
+	struct msghdr msg;
+	char *buf;
+
+	buf = (char *)malloc(1024);
+	conn_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_info.key.conn_id_length = variant->conn_id_len;
+	memcpy(conn_info.key.conn_id,
+	       &variant->conn_id,
+	       variant->conn_id_len);
+	memcpy(&conn_info.crypto_info_aes_gcm_128.packet_encryption_key,
+	       &variant->conn_key, 16);
+	memcpy(&conn_info.crypto_info_aes_gcm_128.packet_encryption_iv,
+	       &variant->conn_iv, 12);
+	memcpy(&conn_info.crypto_info_aes_gcm_128.header_encryption_key,
+	       &variant->conn_hdr_key,
+	       16);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &wrong_frag_size,
+			     sizeof(wrong_frag_size)), 0);
+
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION,
+			     &conn_info, sizeof(conn_info)), 0);
+
+	recv_addr_len = self->len_c;
+
+	iov[0].iov_base = test_str;
+	iov[0].iov_len = msg_len;
+
+	memset(cmsg_buf, 0, sizeof(cmsg_buf));
+	msg.msg_name = (self->client.addr.sin_family == AF_INET)
+		       ? (void *)&self->client.addr
+		       : (void *)&self->client.addr6;
+	msg.msg_namelen = self->len_c;
+	msg.msg_iov = iov;
+	msg.msg_iovlen = 1;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->flags = 0;
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->conn_id_length = variant->conn_id_len;
+
+	cmsg_hdr = CMSG_NXTHDR(&msg, cmsg_hdr);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_SEGMENT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(sizeof(uint16_t));
+	memcpy(CMSG_DATA(cmsg_hdr), (void *)&frag_size, sizeof(frag_size));
+
+	EXPECT_EQ(sendmsg(self->sfd, &msg, 0), send_len);
+	ASSERT_NE(setns(self->client_net_ns_fd, 0), -1);
+	if (variant->af_client == AF_INET) {
+		EXPECT_EQ(recvfrom(self->cfd, buf, 1024, 0,
+				   &self->client.addr, &recv_addr_len),
+			  sizeof(match_str));
+	} else {
+		EXPECT_EQ(recvfrom(self->cfd, buf, 9000, 0,
+				   &self->client.addr6, &recv_addr_len),
+			  sizeof(match_str));
+	}
+	EXPECT_STREQ(buf, match_str);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION,
+			     &conn_info, sizeof(conn_info)), 0);
+	free(buf);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+TEST_F(quic_crypto, encrypt_test_vector_aesgcm128_single_flow_gso_in_setsockopt)
+{
+	char test_str[37] = {// Header, conn id and pkt num
+			     0x40, 0x08, 0x6B, 0xBF, 0x88, 0x82, 0xB9, 0x12,
+			     0x49, 0xCA,
+			     // Payload
+			     0x02, 0x80, 0xDE, 0x40, 0x39, 0x40, 0xF6, 0x00,
+			     0x01, 0x0B, 0x00, 0x0F, 0x65, 0x63, 0x68, 0x6F,
+			     0x20, 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36,
+			     0x37, 0x38, 0x39
+	};
+
+	char match_str[53] = {
+			     0x46, 0x08, 0x6B, 0xBF, 0x88, 0x82, 0xB9, 0x12,
+			     0x49, 0x1C, 0x44, 0xB8, 0x41, 0xBB, 0xCF, 0x6E,
+			     0x0A, 0x2A, 0x24, 0xFB, 0xB4, 0x79, 0x62, 0xEA,
+			     0x59, 0x38, 0x1A, 0x0E, 0x50, 0x1E, 0x59, 0xED,
+			     0x3F, 0x8E, 0x7E, 0x5A, 0x70, 0xE4, 0x2A, 0xBC,
+			     0x2A, 0xFA, 0x2B, 0x54, 0xEB, 0x89, 0xC3, 0x2C,
+			     0xB6, 0x8C, 0x1E, 0xAB, 0x2D
+	};
+
+	size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
+	uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)];
+	struct quic_tx_ancillary_data *anc_data;
+	struct quic_connection_info conn_info;
+	int send_len = sizeof(test_str);
+	int msg_len = sizeof(test_str);
+	struct cmsghdr *cmsg_hdr;
+	socklen_t recv_addr_len;
+	int frag_size = 1200;
+	struct iovec iov[2];
+	struct msghdr msg;
+	char *buf;
+
+	buf = (char *)malloc(1024);
+
+	conn_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_info.key.conn_id_length = variant->conn_id_len;
+	memcpy(&conn_info.key.conn_id,
+	       &variant->conn_id,
+	       variant->conn_id_len);
+	memcpy(&conn_info.crypto_info_aes_gcm_128.packet_encryption_key,
+	       &variant->conn_key, 16);
+	memcpy(&conn_info.crypto_info_aes_gcm_128.packet_encryption_iv,
+	       &variant->conn_iv, 12);
+	memcpy(&conn_info.crypto_info_aes_gcm_128.header_encryption_key,
+	       &variant->conn_hdr_key,
+	       16);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
+			     sizeof(frag_size)),
+		  0);
+
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION,
+			     &conn_info, sizeof(conn_info)),
+		  0);
+
+	recv_addr_len = self->len_c;
+
+	iov[0].iov_base = test_str;
+	iov[0].iov_len = msg_len;
+
+	msg.msg_name = (self->client.addr.sin_family == AF_INET)
+		       ? (void *)&self->client.addr
+		       : (void *)&self->client.addr6;
+	msg.msg_namelen = self->len_c;
+	msg.msg_iov = iov;
+	msg.msg_iovlen = 1;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->flags = 0;
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->conn_id_length = variant->conn_id_len;
+
+	EXPECT_EQ(sendmsg(self->sfd, &msg, 0), send_len);
+	ASSERT_NE(setns(self->client_net_ns_fd, 0), -1);
+	if (variant->af_client == AF_INET) {
+		EXPECT_EQ(recvfrom(self->cfd, buf, 1024, 0,
+				   &self->client.addr, &recv_addr_len),
+			  sizeof(match_str));
+	} else {
+		EXPECT_EQ(recvfrom(self->cfd, buf, 9000, 0,
+				   &self->client.addr6, &recv_addr_len),
+			  sizeof(match_str));
+	}
+	EXPECT_STREQ(buf, match_str);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION,
+			     &conn_info, sizeof(conn_info)), 0);
+	free(buf);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/net/quic.sh b/tools/testing/selftests/net/quic.sh
new file mode 100755
index 000000000000..6c684e670e82
--- /dev/null
+++ b/tools/testing/selftests/net/quic.sh
@@ -0,0 +1,45 @@
+#!/bin/bash
+
+sudo ip netns add ns11
+sudo ip netns add ns12
+sudo ip netns add ns2
+sudo ip link add veth11 type veth peer name br-veth11
+sudo ip link add veth12 type veth peer name br-veth12
+sudo ip link add veth2 type veth peer name br-veth2
+sudo ip link set veth11 netns ns11
+sudo ip link set veth12 netns ns12
+sudo ip link set veth2 netns ns2
+sudo ip netns exec ns11 ip addr add 10.0.0.1/24 dev veth11
+sudo ip netns exec ns11 ip addr add ::ffff:10.0.0.1/96 dev veth11
+sudo ip netns exec ns11 ip addr add 2001::1/64 dev veth11
+sudo ip netns exec ns12 ip addr add 10.0.0.3/24 dev veth12
+sudo ip netns exec ns12 ip addr add ::ffff:10.0.0.3/96 dev veth12
+sudo ip netns exec ns12 ip addr add 2001::3/64 dev veth12
+sudo ip netns exec ns2 ip addr add 10.0.0.2/24 dev veth2
+sudo ip netns exec ns2 ip addr add ::ffff:10.0.0.2/96 dev veth2
+sudo ip netns exec ns2 ip addr add 2001::2/64 dev veth2
+sudo ip link add name br1 type bridge forward_delay 0
+sudo ip link set br1 up
+sudo ip link set br-veth11 up
+sudo ip link set br-veth12 up
+sudo ip link set br-veth2 up
+sudo ip netns exec ns11 ip link set veth11 up
+sudo ip netns exec ns12 ip link set veth12 up
+sudo ip netns exec ns2 ip link set veth2 up
+sudo ip link set br-veth11 master br1
+sudo ip link set br-veth12 master br1
+sudo ip link set br-veth2 master br1
+sudo ip netns exec ns2 cat /proc/net/quic_stat
+
+printf "%s" "Waiting for bridge to start fowarding ..."
+while ! timeout 0.5 sudo ip netns exec ns2 ping -c 1 -n 2001::1 &> /dev/null
+do
+	printf "%c" "."
+done
+printf "\n%s\n"  "Bridge is operational"
+
+sudo ./quic
+sudo ip netns exec ns2 cat /proc/net/quic_stat
+sudo ip netns delete ns2
+sudo ip netns delete ns12
+sudo ip netns delete ns11
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [RFC net-next 1/6] net: Documentation on QUIC kernel Tx crypto.
  2022-08-03 16:40   ` [RFC net-next 1/6] net: Documentation on QUIC kernel Tx crypto Adel Abouchaev
@ 2022-08-03 18:23     ` Andrew Lunn
  2022-08-03 18:51       ` Adel Abouchaev
  2022-08-04 13:57     ` Jonathan Corbet
  1 sibling, 1 reply; 77+ messages in thread
From: Andrew Lunn @ 2022-08-03 18:23 UTC (permalink / raw)
  To: Adel Abouchaev
  Cc: kuba, davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

> +Statistics
> +==========
> +
> +QUIC Tx offload to the kernel has counters reflected in /proc/net/quic_stat:
> +
> +  QuicCurrTxSw  - number of currently active kernel offloaded QUIC connections
> +  QuicTxSw      - accumulative total number of offloaded QUIC connections
> +  QuicTxSwError - accumulative total number of errors during QUIC Tx offload to
> +                  kernel

netlink messages please, not /proc for statistics. netlink is the
preferred way to configure and report about the network stack.

	 Andrew

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC net-next 1/6] net: Documentation on QUIC kernel Tx crypto.
  2022-08-03 18:23     ` Andrew Lunn
@ 2022-08-03 18:51       ` Adel Abouchaev
  2022-08-04 15:29         ` Andrew Lunn
  0 siblings, 1 reply; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-03 18:51 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: kuba, davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

Andrew,

    Could you add more to your comment? The /proc was used similarly to 
kTLS. Netlink is better, though, unsure how ULP stats would fit in it.

Cheers,

Adel.

On 8/3/22 11:23 AM, Andrew Lunn wrote:
>> +Statistics
>> +==========
>> +
>> +QUIC Tx offload to the kernel has counters reflected in /proc/net/quic_stat:
>> +
>> +  QuicCurrTxSw  - number of currently active kernel offloaded QUIC connections
>> +  QuicTxSw      - accumulative total number of offloaded QUIC connections
>> +  QuicTxSwError - accumulative total number of errors during QUIC Tx offload to
>> +                  kernel
> netlink messages please, not /proc for statistics. netlink is the
> preferred way to configure and report about the network stack.
>
> 	 Andrew

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC net-next 1/6] net: Documentation on QUIC kernel Tx crypto.
  2022-08-03 16:40   ` [RFC net-next 1/6] net: Documentation on QUIC kernel Tx crypto Adel Abouchaev
  2022-08-03 18:23     ` Andrew Lunn
@ 2022-08-04 13:57     ` Jonathan Corbet
  1 sibling, 0 replies; 77+ messages in thread
From: Jonathan Corbet @ 2022-08-04 13:57 UTC (permalink / raw)
  To: Adel Abouchaev, kuba
  Cc: davem, edumazet, pabeni, dsahern, shuah, imagedong, netdev,
	linux-doc, linux-kselftest

Adel Abouchaev <adel.abushaev@gmail.com> writes:

> Adding Documentation/networking/quic.rst file to describe kernel QUIC
> code.
>
> Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>
> ---
>  Documentation/networking/quic.rst | 176 ++++++++++++++++++++++++++++++
>  1 file changed, 176 insertions(+)
>  create mode 100644 Documentation/networking/quic.rst

When you add a new RST file, you need to add it to the index.rst as well
or it won't be pulled into the docs build.

Also...this all looks like user-space API documentation, so might
Documentation/userspace-api be a better place for it?

Thanks,

jon

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC net-next 1/6] net: Documentation on QUIC kernel Tx crypto.
  2022-08-03 18:51       ` Adel Abouchaev
@ 2022-08-04 15:29         ` Andrew Lunn
  2022-08-04 16:57           ` Adel Abouchaev
  0 siblings, 1 reply; 77+ messages in thread
From: Andrew Lunn @ 2022-08-04 15:29 UTC (permalink / raw)
  To: Adel Abouchaev
  Cc: kuba, davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

On Wed, Aug 03, 2022 at 11:51:59AM -0700, Adel Abouchaev wrote:
> Andrew,
> 
>    Could you add more to your comment? The /proc was used similarly to kTLS.
> Netlink is better, though, unsure how ULP stats would fit in it.

How do tools like ss(1) retrieve the protocol summary statistics? Do
they still use /proc, or netlink?

     Andrew

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC net-next 1/6] net: Documentation on QUIC kernel Tx crypto.
  2022-08-04 15:29         ` Andrew Lunn
@ 2022-08-04 16:57           ` Adel Abouchaev
  2022-08-04 17:00             ` Eric Dumazet
  0 siblings, 1 reply; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-04 16:57 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: kuba, davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

Looking at 
https://github.com/shemminger/iproute2/blob/main/misc/ss.c#L589 the ss.c 
still uses proc/.

Adel.

On 8/4/22 8:29 AM, Andrew Lunn wrote:
> On Wed, Aug 03, 2022 at 11:51:59AM -0700, Adel Abouchaev wrote:
>> Andrew,
>>
>>     Could you add more to your comment? The /proc was used similarly to kTLS.
>> Netlink is better, though, unsure how ULP stats would fit in it.
> How do tools like ss(1) retrieve the protocol summary statistics? Do
> they still use /proc, or netlink?
>
>       Andrew

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC net-next 1/6] net: Documentation on QUIC kernel Tx crypto.
  2022-08-04 16:57           ` Adel Abouchaev
@ 2022-08-04 17:00             ` Eric Dumazet
  2022-08-04 18:09               ` Jakub Kicinski
  0 siblings, 1 reply; 77+ messages in thread
From: Eric Dumazet @ 2022-08-04 17:00 UTC (permalink / raw)
  To: Adel Abouchaev
  Cc: Andrew Lunn, Jakub Kicinski, David Miller, Paolo Abeni,
	Jonathan Corbet, David Ahern, Shuah Khan, Menglong Dong, netdev,
	open list:DOCUMENTATION, open list:KERNEL SELFTEST FRAMEWORK

On Thu, Aug 4, 2022 at 9:58 AM Adel Abouchaev <adel.abushaev@gmail.com> wrote:
>
> Looking at
> https://github.com/shemminger/iproute2/blob/main/misc/ss.c#L589 the ss.c
> still uses proc/.
>

Only for legacy reasons.

ss -t for sure will use netlink first, then fallback to /proc

New counters should use netlink, please.

> Adel.
>
> On 8/4/22 8:29 AM, Andrew Lunn wrote:
> > On Wed, Aug 03, 2022 at 11:51:59AM -0700, Adel Abouchaev wrote:
> >> Andrew,
> >>
> >>     Could you add more to your comment? The /proc was used similarly to kTLS.
> >> Netlink is better, though, unsure how ULP stats would fit in it.
> > How do tools like ss(1) retrieve the protocol summary statistics? Do
> > they still use /proc, or netlink?
> >
> >       Andrew

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC net-next 1/6] net: Documentation on QUIC kernel Tx crypto.
  2022-08-04 17:00             ` Eric Dumazet
@ 2022-08-04 18:09               ` Jakub Kicinski
  2022-08-04 18:45                 ` Eric Dumazet
  0 siblings, 1 reply; 77+ messages in thread
From: Jakub Kicinski @ 2022-08-04 18:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Adel Abouchaev, Andrew Lunn, David Miller, Paolo Abeni,
	Jonathan Corbet, David Ahern, Shuah Khan, Menglong Dong, netdev,
	open list:DOCUMENTATION, open list:KERNEL SELFTEST FRAMEWORK

On Thu, 4 Aug 2022 10:00:37 -0700 Eric Dumazet wrote:
> On Thu, Aug 4, 2022 at 9:58 AM Adel Abouchaev <adel.abushaev@gmail.com> wrote:
> > Looking at
> > https://github.com/shemminger/iproute2/blob/main/misc/ss.c#L589 the ss.c
> > still uses proc/.
> 
> Only for legacy reasons.

That but in all honesty also the fact that a proc file is pretty easy
and self-describing while the historic netlink families are undocumented
code salads.

> ss -t for sure will use netlink first, then fallback to /proc
> 
> New counters should use netlink, please.

Just to be sure I'm not missing anything - we're talking about some 
new netlink, right? Is there an existing place for "overall prot family
stats" over netlink today?

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC net-next 1/6] net: Documentation on QUIC kernel Tx crypto.
  2022-08-04 18:09               ` Jakub Kicinski
@ 2022-08-04 18:45                 ` Eric Dumazet
  0 siblings, 0 replies; 77+ messages in thread
From: Eric Dumazet @ 2022-08-04 18:45 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Adel Abouchaev, Andrew Lunn, David Miller, Paolo Abeni,
	Jonathan Corbet, David Ahern, Shuah Khan, Menglong Dong, netdev,
	open list:DOCUMENTATION, open list:KERNEL SELFTEST FRAMEWORK

On Thu, Aug 4, 2022 at 11:09 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 4 Aug 2022 10:00:37 -0700 Eric Dumazet wrote:
> > On Thu, Aug 4, 2022 at 9:58 AM Adel Abouchaev <adel.abushaev@gmail.com> wrote:
> > > Looking at
> > > https://github.com/shemminger/iproute2/blob/main/misc/ss.c#L589 the ss.c
> > > still uses proc/.
> >
> > Only for legacy reasons.
>
> That but in all honesty also the fact that a proc file is pretty easy
> and self-describing while the historic netlink families are undocumented
> code salads.
>
> > ss -t for sure will use netlink first, then fallback to /proc
> >
> > New counters should use netlink, please.
>
> Just to be sure I'm not missing anything - we're talking about some
> new netlink, right? Is there an existing place for "overall prot family
> stats" over netlink today?

I thought we were speaking of dumping ULP info on a per UDP socket basis.

If this is about new SNMP counters, then sure, /proc is fine I guess.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC net-next 0/6] net: support QUIC crypto
  2022-08-01 19:52 ` [RFC net-next 0/6] net: support QUIC crypto Adel Abouchaev
                     ` (5 preceding siblings ...)
  2022-08-01 19:52   ` [RFC net-next 6/6] net: Add self tests for ULP operations, flow setup and crypto tests Adel Abouchaev
@ 2022-08-05  3:37   ` Bagas Sanjaya
  6 siblings, 0 replies; 77+ messages in thread
From: Bagas Sanjaya @ 2022-08-05  3:37 UTC (permalink / raw)
  To: Adel Abouchaev
  Cc: kuba, davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

On Mon, Aug 01, 2022 at 12:52:22PM -0700, Adel Abouchaev wrote:
> QUIC requires end to end encryption of the data. The application usually
> prepares the data in clear text, encrypts and calls send() which implies
> multiple copies of the data before the packets hit the networking stack.
> Similar to kTLS, QUIC kernel offload of cryptography reduces the memory
> pressure by reducing the number of copies.
> 
> The scope of kernel support is limited to the symmetric cryptography,
> leaving the handshake to the user space library. For QUIC in particular,
> the application packets that require symmetric cryptography are the 1RTT
> packets with short headers. Kernel will encrypt the application packets
> on transmission and decrypt on receive. This series implements Tx only,
> because in QUIC server applications Tx outweighs Rx by orders of
> magnitude.
> 
> Supporting the combination of QUIC and GSO requires the application to
> correctly place the data and the kernel to correctly slice it. The
> encryption process appends an arbitrary number of bytes (tag) to the end
> of the message to authenticate it. The GSO value should include this
> overhead, the offload would then subtract the tag size to parse the
> input on Tx before chunking and encrypting it.
> 
> With the kernel cryptography, the buffer copy operation is conjoined
> with the encryption operation. The memory bandwidth is reduced by 5-8%.
> When devices supporting QUIC encryption in hardware come to the market,
> we will be able to free further 7% of CPU utilization which is used
> today for crypto operations.
> 

Hi,

I can't apply this series on top of current net-next. On what commit on
net-next this series is based?

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [RFC net-next v2 0/6] net: support QUIC crypto
       [not found] <Adel Abouchaev <adel.abushaev@gmail.com>
  2022-08-01 19:52 ` [RFC net-next 0/6] net: support QUIC crypto Adel Abouchaev
  2022-08-03 16:40 ` Adel Abouchaev
@ 2022-08-06  0:11 ` Adel Abouchaev
  2022-08-06  0:11   ` [RFC net-next v2 1/6] Documentation on QUIC kernel Tx crypto Adel Abouchaev
                     ` (5 more replies)
  2022-08-16 18:11 ` [net-next 0/6] net: support QUIC crypto Adel Abouchaev
                   ` (4 subsequent siblings)
  7 siblings, 6 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-06  0:11 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

QUIC requires end to end encryption of the data. The application usually
prepares the data in clear text, encrypts and calls send() which implies
multiple copies of the data before the packets hit the networking stack.
Similar to kTLS, QUIC kernel offload of cryptography reduces the memory
pressure by reducing the number of copies.

The scope of kernel support is limited to the symmetric cryptography,
leaving the handshake to the user space library. For QUIC in particular,
the application packets that require symmetric cryptography are the 1RTT
packets with short headers. Kernel will encrypt the application packets
on transmission and decrypt on receive. This series implements Tx only,
because in QUIC server applications Tx outweighs Rx by orders of
magnitude.

Supporting the combination of QUIC and GSO requires the application to
correctly place the data and the kernel to correctly slice it. The
encryption process appends an arbitrary number of bytes (tag) to the end
of the message to authenticate it. The GSO value should include this
overhead, the offload would then subtract the tag size to parse the
input on Tx before chunking and encrypting it.

With the kernel cryptography, the buffer copy operation is conjoined
with the encryption operation. The memory bandwidth is reduced by 5-8%.
When devices supporting QUIC encryption in hardware come to the market,
we will be able to free further 7% of CPU utilization which is used
today for crypto operations.


Adel Abouchaev (6):
  Documentation on QUIC kernel Tx crypto.
  Define QUIC specific constants, control and data plane structures
  Add UDP ULP operations, initialization and handling prototype
    functions.
  Implement QUIC offload functions
  Add flow counters and Tx processing error counter
  Add self tests for ULP operations, flow setup and crypto tests
  v2: Moved the inner QUIC Kconfig from the ULP patch to QUIC patch.
  v2: Updated the tests to match the uAPI context structure fields.
  v2: Formatted the quic.rst document.

 Documentation/networking/index.rst     |    1 +
 Documentation/networking/quic.rst      |  186 +++
 include/net/inet_sock.h                |    2 +
 include/net/netns/mib.h                |    3 +
 include/net/quic.h                     |   59 +
 include/net/snmp.h                     |    6 +
 include/net/udp.h                      |   33 +
 include/uapi/linux/quic.h              |   61 +
 include/uapi/linux/snmp.h              |   11 +
 include/uapi/linux/udp.h               |    4 +
 net/Kconfig                            |    1 +
 net/Makefile                           |    1 +
 net/ipv4/Makefile                      |    3 +-
 net/ipv4/udp.c                         |   14 +
 net/ipv4/udp_ulp.c                     |  190 ++++
 net/quic/Kconfig                       |   16 +
 net/quic/Makefile                      |    8 +
 net/quic/quic_main.c                   | 1446 ++++++++++++++++++++++++
 net/quic/quic_proc.c                   |   45 +
 tools/testing/selftests/net/.gitignore |    3 +-
 tools/testing/selftests/net/Makefile   |    2 +-
 tools/testing/selftests/net/quic.c     | 1024 +++++++++++++++++
 tools/testing/selftests/net/quic.sh    |   45 +
 23 files changed, 3161 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/networking/quic.rst
 create mode 100644 include/net/quic.h
 create mode 100644 include/uapi/linux/quic.h
 create mode 100644 net/ipv4/udp_ulp.c
 create mode 100644 net/quic/Kconfig
 create mode 100644 net/quic/Makefile
 create mode 100644 net/quic/quic_main.c
 create mode 100644 net/quic/quic_proc.c
 create mode 100644 tools/testing/selftests/net/quic.c
 create mode 100755 tools/testing/selftests/net/quic.sh

-- 
2.30.2


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [RFC net-next v2 1/6] Documentation on QUIC kernel Tx crypto.
  2022-08-06  0:11 ` [RFC net-next v2 0/6] net: support QUIC crypto Adel Abouchaev
@ 2022-08-06  0:11   ` Adel Abouchaev
  2022-08-06  3:05     ` Bagas Sanjaya
  2022-08-06  0:11   ` [RFC net-next v2 2/6] Define QUIC specific constants, control and data plane structures Adel Abouchaev
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-06  0:11 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest, kernel test robot

Adding Documentation/networking/quic.rst file to describe kernel QUIC
code.

Signed-off-by: Adel Abouchaev <adelab@fb.com>

---

v2: Added quic.rst reference to the index.rst file; identation in
quic.rst file.
Reported-by: kernel test robot <lkp@intel.com>
---
 Documentation/networking/index.rst |   1 +
 Documentation/networking/quic.rst  | 186 +++++++++++++++++++++++++++++
 2 files changed, 187 insertions(+)
 create mode 100644 Documentation/networking/quic.rst

diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index 03b215bddde8..656fa1dac26b 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -90,6 +90,7 @@ Contents:
    plip
    ppp_generic
    proc_net_tcp
+   quic
    radiotap-headers
    rds
    regulatory
diff --git a/Documentation/networking/quic.rst b/Documentation/networking/quic.rst
new file mode 100644
index 000000000000..416099b80e60
--- /dev/null
+++ b/Documentation/networking/quic.rst
@@ -0,0 +1,186 @@
+.. _kernel_quic:
+
+===========
+KERNEL QUIC
+===========
+
+Overview
+========
+
+QUIC is a secure general-purpose transport protocol that creates a stateful
+interaction between a client and a server. QUIC provides end-to-end integrity
+and confidentiality. Refer to RFC 9000 for more information on QUIC.
+
+The kernel Tx side offload covers the encryption of the application streams
+in the kernel rather than in the application. These packets are 1RTT packets
+in QUIC connection. Encryption of every other packets is still done by the
+QUIC library in user space.
+
+
+
+User Interface
+==============
+
+Creating a QUIC connection
+--------------------------
+
+QUIC connection originates and terminates in the application, using one of many
+available QUIC libraries. The code instantiates QUIC client and QUIC server in
+some form and configures them to use certain addresses and ports for the
+source and destination. The client and server negotiate the set of keys to
+protect the communication during different phases of the connection, maintain
+the connection and perform congestion control.
+
+Requesting to add QUIC Tx kernel encryption to the connection
+-------------------------------------------------------------
+
+Each flow that should be encrypted by the kernel needs to be registered with
+the kernel using socket API. A setsockopt() call on the socket creates an
+association between the QUIC connection ID of the flow with the encryption
+parameters for the crypto operations:
+
+.. code-block:: c
+
+	struct quic_connection_info conn_info;
+	char conn_id[5] = {0x01, 0x02, 0x03, 0x04, 0x05};
+	const size_t conn_id_len = sizeof(conn_id);
+	char conn_key[16] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+			     0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f};
+	char conn_iv[12] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+			    0x08, 0x09, 0x0a, 0x0b};
+	char conn_hdr_key[16] = {0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
+				 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f
+				};
+
+	conn_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+
+	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_info.key.conn_id_length = 5;
+	memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE
+				      - conn_id_len],
+	       &conn_id, conn_id_len);
+
+	memcpy(&conn_info.payload_key, conn_key, sizeof(conn_key));
+	memcpy(&conn_info.payload_iv, conn_iv, sizeof(conn_iv));
+	memcpy(&conn_info.header_key, conn_hdr_key, sizeof(conn_hdr_key));
+
+	setsockopt(fd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION, &conn_info,
+		   sizeof(conn_info));
+
+
+Requesting to remove QUIC Tx kernel crypto offload control messages
+-------------------------------------------------------------------
+
+All flows are removed when the socket is closed. To request an explicit remove
+of the offload for the connection during the lifetime of the socket the process
+is similar to adding the flow. Only the connection ID and its length are
+necessary to supply to remove the connection from the offload:
+
+.. code-block:: c
+
+	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_info.key.conn_id_length = 5;
+	memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE
+				      - conn_id_len],
+	       &conn_id, conn_id_len);
+	setsockopt(fd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION, &conn_info,
+		   sizeof(conn_info));
+
+Sending QUIC application data
+-----------------------------
+
+For QUIC Tx encryption offload, the application should use sendmsg() socket
+call and provide ancillary data with information on connection ID length and
+offload flags for the kernel to perform the encryption and GSO support if
+requested.
+
+.. code-block:: c
+
+	size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
+	uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)];
+	struct quic_tx_ancillary_data * anc_data;
+	size_t quic_data_len = 4500;
+	struct cmsghdr * cmsg_hdr;
+	char quic_data[9000];
+	struct iovec iov[2];
+	int send_len = 9000;
+	struct msghdr msg;
+	int err;
+
+	iov[0].iov_base = quic_data;
+	iov[0].iov_len = quic_data_len;
+	iov[1].iov_base = quic_data + 4500;
+	iov[1].iov_len = quic_data_len;
+
+	if (client.addr.sin_family == AF_INET) {
+		msg.msg_name = &client.addr;
+		msg.msg_namelen = sizeof(client.addr);
+	} else {
+		msg.msg_name = &client.addr6;
+		msg.msg_namelen = sizeof(client.addr6);
+	}
+
+	msg.msg_iov = iov;
+	msg.msg_iovlen = 2;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
+	anc_data = CMSG_DATA(cmsg_hdr);
+	anc_data->flags = 0;
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->conn_id_length = conn_id_len;
+	err = sendmsg(self->sfd, &msg, 0);
+
+QUIC Tx offload in kernel will read the data from userspace, encrypt and
+copy it to the ciphertext within the same operation.
+
+
+Sending QUIC application data with GSO
+--------------------------------------
+When GSO is in use, the kernel will use the GSO fragment size as the target
+for ciphertext. The packets from the user space should align on the boundary
+of GSO fragment size minus the size of the tag for the chosen cipher. For the
+GSO fragment 1200, the plain packets should follow each other at every 1184
+bytes, given the tag size of 16. After the encryption, the rest of the UDP
+and IP stacks will follow the defined value of GSO fragment which will include
+the trailing tag bytes.
+
+To set up GSO fragmentation:
+
+.. code-block:: c
+
+	setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
+		   sizeof(frag_size));
+
+If the GSO fragment size is provided in ancillary data within the sendmsg()
+call, the value in ancillary data will take precedence over the segment size
+provided in setsockopt to split the payload into packets. This is consistent
+with the UDP stack behavior.
+
+Integrating to userspace QUIC libraries
+---------------------------------------
+
+Userspace QUIC libraries integration would depend on the implementation of the
+QUIC protocol. For MVFST library, the control plane is integrated into the
+handshake callbacks to properly configure the flows into the socket; and the
+data plane is integrated into the methods that perform encryption and send
+the packets to the batch scheduler for transmissions to the socket.
+
+MVFST library can be found at https://github.com/facebookincubator/mvfst.
+
+Statistics
+==========
+
+QUIC Tx offload to the kernel has counters
+(``/proc/net/quic_stat``):
+
+- ``QuicCurrTxSw`` -
+  number of currently active kernel offloaded QUIC connections
+- ``QuicTxSw`` -
+  accumulative total number of offloaded QUIC connections
+- ``QuicTxSwError`` -
+  accumulative total number of errors during QUIC Tx offload to kernel
+
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC net-next v2 2/6] Define QUIC specific constants, control and data plane structures
  2022-08-06  0:11 ` [RFC net-next v2 0/6] net: support QUIC crypto Adel Abouchaev
  2022-08-06  0:11   ` [RFC net-next v2 1/6] Documentation on QUIC kernel Tx crypto Adel Abouchaev
@ 2022-08-06  0:11   ` Adel Abouchaev
  2022-08-06  0:11   ` [RFC net-next v2 3/6] Add UDP ULP operations, initialization and handling prototype functions Adel Abouchaev
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-06  0:11 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

Define control and data plane structures to pass in control plane for
flow add/remove and during packet send within ancillary data. Define
constants to use within SOL_UDP to program QUIC sockets.

Signed-off-by: Adel Abouchaev <adelab@fb.com>
---
 include/uapi/linux/quic.h | 61 +++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/udp.h  |  3 ++
 2 files changed, 64 insertions(+)
 create mode 100644 include/uapi/linux/quic.h

diff --git a/include/uapi/linux/quic.h b/include/uapi/linux/quic.h
new file mode 100644
index 000000000000..79680b8b18a6
--- /dev/null
+++ b/include/uapi/linux/quic.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) */
+
+#ifndef _UAPI_LINUX_QUIC_H
+#define _UAPI_LINUX_QUIC_H
+
+#include <linux/types.h>
+#include <linux/tls.h>
+
+#define QUIC_MAX_CONNECTION_ID_SIZE 20
+
+/* Side by side data for QUIC egress operations */
+#define QUIC_BYPASS_ENCRYPTION 0x01
+
+struct quic_tx_ancillary_data {
+	__aligned_u64	next_pkt_num;
+	__u8	flags;
+	__u8	conn_id_length;
+};
+
+struct quic_connection_info_key {
+	__u8	conn_id[QUIC_MAX_CONNECTION_ID_SIZE];
+	__u8	conn_id_length;
+};
+
+struct quic_aes_gcm_128 {
+	__u8	header_key[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
+	__u8	payload_key[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
+	__u8	payload_iv[TLS_CIPHER_AES_GCM_128_IV_SIZE];
+};
+
+struct quic_aes_gcm_256 {
+	__u8	header_key[TLS_CIPHER_AES_GCM_256_KEY_SIZE];
+	__u8	payload_key[TLS_CIPHER_AES_GCM_256_KEY_SIZE];
+	__u8	payload_iv[TLS_CIPHER_AES_GCM_256_IV_SIZE];
+};
+
+struct quic_aes_ccm_128 {
+	__u8	header_key[TLS_CIPHER_AES_CCM_128_KEY_SIZE];
+	__u8	payload_key[TLS_CIPHER_AES_CCM_128_KEY_SIZE];
+	__u8	payload_iv[TLS_CIPHER_AES_CCM_128_IV_SIZE];
+};
+
+struct quic_chacha20_poly1305 {
+	__u8	header_key[TLS_CIPHER_CHACHA20_POLY1305_KEY_SIZE];
+	__u8	payload_key[TLS_CIPHER_CHACHA20_POLY1305_KEY_SIZE];
+	__u8	payload_iv[TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE];
+};
+
+struct quic_connection_info {
+	__u16	cipher_type;
+	struct quic_connection_info_key		key;
+	union {
+		struct quic_aes_gcm_128 aes_gcm_128;
+		struct quic_aes_gcm_256 aes_gcm_256;
+		struct quic_aes_ccm_128 aes_ccm_128;
+		struct quic_chacha20_poly1305 chacha20_poly1305;
+	};
+};
+
+#endif
+
diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
index 4828794efcf8..0ee4c598e70b 100644
--- a/include/uapi/linux/udp.h
+++ b/include/uapi/linux/udp.h
@@ -34,6 +34,9 @@ struct udphdr {
 #define UDP_NO_CHECK6_RX 102	/* Disable accpeting checksum for UDP6 */
 #define UDP_SEGMENT	103	/* Set GSO segmentation size */
 #define UDP_GRO		104	/* This socket can receive UDP GRO packets */
+#define UDP_QUIC_ADD_TX_CONNECTION	106 /* Add QUIC Tx crypto offload */
+#define UDP_QUIC_DEL_TX_CONNECTION	107 /* Del QUIC Tx crypto offload */
+#define UDP_QUIC_ENCRYPT		108 /* QUIC encryption parameters */
 
 /* UDP encapsulation types */
 #define UDP_ENCAP_ESPINUDP_NON_IKE	1 /* draft-ietf-ipsec-nat-t-ike-00/01 */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC net-next v2 3/6] Add UDP ULP operations, initialization and handling prototype functions.
  2022-08-06  0:11 ` [RFC net-next v2 0/6] net: support QUIC crypto Adel Abouchaev
  2022-08-06  0:11   ` [RFC net-next v2 1/6] Documentation on QUIC kernel Tx crypto Adel Abouchaev
  2022-08-06  0:11   ` [RFC net-next v2 2/6] Define QUIC specific constants, control and data plane structures Adel Abouchaev
@ 2022-08-06  0:11   ` Adel Abouchaev
  2022-08-06  0:11   ` [RFC net-next v2 4/6] Implement QUIC offload functions Adel Abouchaev
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-06  0:11 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

Define functions to add UDP ULP handling, registration with UDP protocol
and supporting data structures. Create structure for QUIC ULP and add empty
prototype functions to support it.

Signed-off-by: Adel Abouchaev <adelab@fb.com>

---

v2: Removed reference to net/quic/Kconfig from this patch into the next.
---
 include/net/inet_sock.h  |   2 +
 include/net/udp.h        |  33 +++++++
 include/uapi/linux/udp.h |   1 +
 net/Makefile             |   1 +
 net/ipv4/Makefile        |   3 +-
 net/ipv4/udp.c           |   6 ++
 net/ipv4/udp_ulp.c       | 190 +++++++++++++++++++++++++++++++++++++++
 7 files changed, 235 insertions(+), 1 deletion(-)
 create mode 100644 net/ipv4/udp_ulp.c

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index bf5654ce711e..650e332bdb50 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -249,6 +249,8 @@ struct inet_sock {
 	__be32			mc_addr;
 	struct ip_mc_socklist __rcu	*mc_list;
 	struct inet_cork_full	cork;
+	const struct udp_ulp_ops	*udp_ulp_ops;
+	void __rcu		*ulp_data;
 };
 
 #define IPCORK_OPT	1	/* ip-options has been held in ipcork.opt */
diff --git a/include/net/udp.h b/include/net/udp.h
index 5ee88ddf79c3..f22ebabbb186 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -523,4 +523,37 @@ struct proto *udp_bpf_get_proto(struct sock *sk, struct sk_psock *psock);
 int udp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore);
 #endif
 
+/*
+ * Interface for adding Upper Level Protocols over UDP
+ */
+
+#define UDP_ULP_NAME_MAX	16
+#define UDP_ULP_MAX		128
+
+struct udp_ulp_ops {
+	struct list_head	list;
+
+	/* initialize ulp */
+	int (*init)(struct sock *sk);
+	/* cleanup ulp */
+	void (*release)(struct sock *sk);
+
+	char		name[UDP_ULP_NAME_MAX];
+	struct module	*owner;
+};
+
+int udp_register_ulp(struct udp_ulp_ops *type);
+void udp_unregister_ulp(struct udp_ulp_ops *type);
+int udp_set_ulp(struct sock *sk, const char *name);
+void udp_get_available_ulp(char *buf, size_t len);
+void udp_cleanup_ulp(struct sock *sk);
+int udp_setsockopt_ulp(struct sock *sk, sockptr_t optval,
+		       unsigned int optlen);
+int udp_getsockopt_ulp(struct sock *sk, char __user *optval,
+		       int __user *optlen);
+
+#define MODULE_ALIAS_UDP_ULP(name)\
+	__MODULE_INFO(alias, alias_userspace, name);\
+	__MODULE_INFO(alias, alias_udp_ulp, "udp-ulp-" name)
+
 #endif	/* _UDP_H */
diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
index 0ee4c598e70b..893691f0108a 100644
--- a/include/uapi/linux/udp.h
+++ b/include/uapi/linux/udp.h
@@ -34,6 +34,7 @@ struct udphdr {
 #define UDP_NO_CHECK6_RX 102	/* Disable accpeting checksum for UDP6 */
 #define UDP_SEGMENT	103	/* Set GSO segmentation size */
 #define UDP_GRO		104	/* This socket can receive UDP GRO packets */
+#define UDP_ULP		105	/* Attach ULP to a UDP socket */
 #define UDP_QUIC_ADD_TX_CONNECTION	106 /* Add QUIC Tx crypto offload */
 #define UDP_QUIC_DEL_TX_CONNECTION	107 /* Del QUIC Tx crypto offload */
 #define UDP_QUIC_ENCRYPT		108 /* QUIC encryption parameters */
diff --git a/net/Makefile b/net/Makefile
index fbfeb8a0bb37..28565bfe29cb 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -16,6 +16,7 @@ obj-y				+= ethernet/ 802/ sched/ netlink/ bpf/ ethtool/
 obj-$(CONFIG_NETFILTER)		+= netfilter/
 obj-$(CONFIG_INET)		+= ipv4/
 obj-$(CONFIG_TLS)		+= tls/
+obj-$(CONFIG_QUIC)		+= quic/
 obj-$(CONFIG_XFRM)		+= xfrm/
 obj-$(CONFIG_UNIX_SCM)		+= unix/
 obj-y				+= ipv6/
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index bbdd9c44f14e..88d3baf4af95 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -14,7 +14,8 @@ obj-y     := route.o inetpeer.o protocol.o \
 	     udp_offload.o arp.o icmp.o devinet.o af_inet.o igmp.o \
 	     fib_frontend.o fib_semantics.o fib_trie.o fib_notifier.o \
 	     inet_fragment.o ping.o ip_tunnel_core.o gre_offload.o \
-	     metrics.o netlink.o nexthop.o udp_tunnel_stub.o
+	     metrics.o netlink.o nexthop.o udp_tunnel_stub.o \
+	     udp_ulp.o
 
 obj-$(CONFIG_BPFILTER) += bpfilter/
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 34eda973bbf1..027c4513a9cd 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2779,6 +2779,9 @@ int udp_lib_setsockopt(struct sock *sk, int level, int optname,
 		up->pcflag |= UDPLITE_RECV_CC;
 		break;
 
+	case UDP_ULP:
+		return udp_setsockopt_ulp(sk, optval, optlen);
+
 	default:
 		err = -ENOPROTOOPT;
 		break;
@@ -2847,6 +2850,9 @@ int udp_lib_getsockopt(struct sock *sk, int level, int optname,
 		val = up->pcrlen;
 		break;
 
+	case UDP_ULP:
+		return udp_getsockopt_ulp(sk, optval, optlen);
+
 	default:
 		return -ENOPROTOOPT;
 	}
diff --git a/net/ipv4/udp_ulp.c b/net/ipv4/udp_ulp.c
new file mode 100644
index 000000000000..3801ed7ad17d
--- /dev/null
+++ b/net/ipv4/udp_ulp.c
@@ -0,0 +1,190 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Pluggable UDP upper layer protocol support, based on pluggable TCP upper
+ * layer protocol support.
+ *
+ * Copyright (c) 2016-2017, Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2016-2017, Dave Watson <davejwatson@fb.com>. All rights reserved.
+ */
+
+#include <linux/gfp.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/mm.h>
+#include <linux/types.h>
+#include <linux/skmsg.h>
+#include <net/tcp.h>
+#include <net/udp.h>
+
+static DEFINE_SPINLOCK(udp_ulp_list_lock);
+static LIST_HEAD(udp_ulp_list);
+
+/* Simple linear search, don't expect many entries! */
+static struct udp_ulp_ops *udp_ulp_find(const char *name)
+{
+	struct udp_ulp_ops *e;
+
+	list_for_each_entry_rcu(e, &udp_ulp_list, list,
+				lockdep_is_held(&udp_ulp_list_lock)) {
+		if (strcmp(e->name, name) == 0)
+			return e;
+	}
+
+	return NULL;
+}
+
+static const struct udp_ulp_ops *__udp_ulp_find_autoload(const char *name)
+{
+	const struct udp_ulp_ops *ulp = NULL;
+
+	rcu_read_lock();
+	ulp = udp_ulp_find(name);
+
+#ifdef CONFIG_MODULES
+	if (!ulp && capable(CAP_NET_ADMIN)) {
+		rcu_read_unlock();
+		request_module("udp-ulp-%s", name);
+		rcu_read_lock();
+		ulp = udp_ulp_find(name);
+	}
+#endif
+	if (!ulp || !try_module_get(ulp->owner))
+		ulp = NULL;
+
+	rcu_read_unlock();
+	return ulp;
+}
+
+/* Attach new upper layer protocol to the list
+ * of available protocols.
+ */
+int udp_register_ulp(struct udp_ulp_ops *ulp)
+{
+	int ret = 0;
+
+	spin_lock(&udp_ulp_list_lock);
+	if (udp_ulp_find(ulp->name))
+		ret = -EEXIST;
+	else
+		list_add_tail_rcu(&ulp->list, &udp_ulp_list);
+
+	spin_unlock(&udp_ulp_list_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(udp_register_ulp);
+
+void udp_unregister_ulp(struct udp_ulp_ops *ulp)
+{
+	spin_lock(&udp_ulp_list_lock);
+	list_del_rcu(&ulp->list);
+	spin_unlock(&udp_ulp_list_lock);
+
+	synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(udp_unregister_ulp);
+
+void udp_cleanup_ulp(struct sock *sk)
+{
+	struct inet_sock *inet = inet_sk(sk);
+
+	/* No sock_owned_by_me() check here as at the time the
+	 * stack calls this function, the socket is dead and
+	 * about to be destroyed.
+	 */
+	if (!inet->udp_ulp_ops)
+		return;
+
+	if (inet->udp_ulp_ops->release)
+		inet->udp_ulp_ops->release(sk);
+	module_put(inet->udp_ulp_ops->owner);
+
+	inet->udp_ulp_ops = NULL;
+}
+
+static int __udp_set_ulp(struct sock *sk, const struct udp_ulp_ops *ulp_ops)
+{
+	struct inet_sock *inet = inet_sk(sk);
+	int err;
+
+	err = -EEXIST;
+	if (inet->udp_ulp_ops)
+		goto out_err;
+
+	err = ulp_ops->init(sk);
+	if (err)
+		goto out_err;
+
+	inet->udp_ulp_ops = ulp_ops;
+	return 0;
+
+out_err:
+	module_put(ulp_ops->owner);
+	return err;
+}
+
+int udp_set_ulp(struct sock *sk, const char *name)
+{
+	struct sk_psock *psock = sk_psock_get(sk);
+	const struct udp_ulp_ops *ulp_ops;
+
+	if (psock){
+		sk_psock_put(sk, psock);
+		return -EINVAL;
+	}
+
+	sock_owned_by_me(sk);
+	ulp_ops = __udp_ulp_find_autoload(name);
+	if (!ulp_ops)
+		return -ENOENT;
+
+	return __udp_set_ulp(sk, ulp_ops);
+}
+
+int udp_setsockopt_ulp(struct sock *sk, sockptr_t optval, unsigned int optlen)
+{
+	char name[UDP_ULP_NAME_MAX];
+	int val, err;
+
+	if (!optlen || optlen > UDP_ULP_NAME_MAX)
+		return -EINVAL;
+
+	val = strncpy_from_sockptr(name, optval, optlen);
+	if (val < 0)
+		return -EFAULT;
+
+	if (val == UDP_ULP_NAME_MAX)
+		return -EINVAL;
+
+	name[val] = 0;
+	lock_sock(sk);
+	err = udp_set_ulp(sk, name);
+	release_sock(sk);
+	return err;
+}
+
+int udp_getsockopt_ulp(struct sock *sk, char __user *optval, int __user *optlen)
+{
+	struct inet_sock *inet = inet_sk(sk);
+	int len;
+
+	if (get_user(len, optlen))
+		return -EFAULT;
+
+	len = min_t(unsigned int, len, UDP_ULP_NAME_MAX);
+	if (len < 0)
+		return -EINVAL;
+
+	if (!inet->udp_ulp_ops) {
+		if (put_user(0, optlen))
+			return -EFAULT;
+		return 0;
+	}
+
+	if (put_user(len, optlen))
+		return -EFAULT;
+	if (copy_to_user(optval, inet->udp_ulp_ops->name, len))
+		return -EFAULT;
+
+	return 0;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC net-next v2 4/6] Implement QUIC offload functions
  2022-08-06  0:11 ` [RFC net-next v2 0/6] net: support QUIC crypto Adel Abouchaev
                     ` (2 preceding siblings ...)
  2022-08-06  0:11   ` [RFC net-next v2 3/6] Add UDP ULP operations, initialization and handling prototype functions Adel Abouchaev
@ 2022-08-06  0:11   ` Adel Abouchaev
  2022-08-06  0:11   ` [RFC net-next v2 5/6] Add flow counters and Tx processing error counter Adel Abouchaev
  2022-08-06  0:11   ` [RFC net-next v2 6/6] Add self tests for ULP operations, flow setup and crypto tests Adel Abouchaev
  5 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-06  0:11 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

Add connection hash to the context do support add, remove operations
on QUIC connections for the control plane and lookup for the data
plane. Implement setsockopt and add placeholders to add and delete Tx
connections.

Signed-off-by: Adel Abouchaev <adelab@fb.com>

---

v2: Added net/quic/Kconfig reference to net/Kconfig in this commit.
---
 include/net/quic.h   |   49 ++
 net/Kconfig          |    1 +
 net/ipv4/udp.c       |    8 +
 net/quic/Kconfig     |   16 +
 net/quic/Makefile    |    8 +
 net/quic/quic_main.c | 1400 ++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 1482 insertions(+)
 create mode 100644 include/net/quic.h
 create mode 100644 net/quic/Kconfig
 create mode 100644 net/quic/Makefile
 create mode 100644 net/quic/quic_main.c

diff --git a/include/net/quic.h b/include/net/quic.h
new file mode 100644
index 000000000000..15e04ea08c53
--- /dev/null
+++ b/include/net/quic.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifndef INCLUDE_NET_QUIC_H
+#define INCLUDE_NET_QUIC_H
+
+#include <linux/mutex.h>
+#include <linux/rhashtable.h>
+#include <linux/skmsg.h>
+#include <uapi/linux/quic.h>
+
+#define QUIC_MAX_SHORT_HEADER_SIZE      25
+#define QUIC_MAX_CONNECTION_ID_SIZE     20
+#define QUIC_HDR_MASK_SIZE              16
+#define QUIC_MAX_GSO_FRAGS              16
+
+// Maximum IV and nonce sizes should be in sync with supported ciphers.
+#define QUIC_CIPHER_MAX_IV_SIZE		12
+#define QUIC_CIPHER_MAX_NONCE_SIZE	16
+
+/* Side by side data for QUIC egress operations */
+#define QUIC_ANCILLARY_FLAGS    (QUIC_BYPASS_ENCRYPTION)
+
+#define QUIC_MAX_IOVEC_SEGMENTS		8
+#define QUIC_MAX_SG_ALLOC_ELEMENTS	32
+#define QUIC_MAX_PLAIN_PAGES		16
+#define QUIC_MAX_CIPHER_PAGES_ORDER	4
+
+struct quic_internal_crypto_context {
+	struct quic_connection_info	conn_info;
+	struct crypto_skcipher		*header_tfm;
+	struct crypto_aead		*packet_aead;
+};
+
+struct quic_connection_rhash {
+	struct rhash_head			node;
+	struct quic_internal_crypto_context	crypto_ctx;
+	struct rcu_head				rcu;
+};
+
+struct quic_context {
+	struct proto		*sk_proto;
+	struct rhashtable	tx_connections;
+	struct scatterlist	sg_alloc[QUIC_MAX_SG_ALLOC_ELEMENTS];
+	struct page		*cipher_page;
+	struct mutex		sendmsg_mux;
+	struct rcu_head		rcu;
+};
+
+#endif
diff --git a/net/Kconfig b/net/Kconfig
index 6b78f695caa6..93e3b1308aec 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -63,6 +63,7 @@ menu "Networking options"
 source "net/packet/Kconfig"
 source "net/unix/Kconfig"
 source "net/tls/Kconfig"
+source "net/quic/Kconfig"
 source "net/xfrm/Kconfig"
 source "net/iucv/Kconfig"
 source "net/smc/Kconfig"
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 027c4513a9cd..6f56d3cbaeee 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -113,6 +113,7 @@
 #include <net/sock_reuseport.h>
 #include <net/addrconf.h>
 #include <net/udp_tunnel.h>
+#include <uapi/linux/quic.h>
 #if IS_ENABLED(CONFIG_IPV6)
 #include <net/ipv6_stubs.h>
 #endif
@@ -1011,6 +1012,13 @@ static int __udp_cmsg_send(struct cmsghdr *cmsg, u16 *gso_size)
 			return -EINVAL;
 		*gso_size = *(__u16 *)CMSG_DATA(cmsg);
 		return 0;
+	case UDP_QUIC_ENCRYPT:
+		/* This option is handled in UDP_ULP and is only checked
+		 * here for the bypass bit
+		 */
+		if (cmsg->cmsg_len != CMSG_LEN(sizeof(struct quic_tx_ancillary_data)))
+			return -EINVAL;
+		return 0;
 	default:
 		return -EINVAL;
 	}
diff --git a/net/quic/Kconfig b/net/quic/Kconfig
new file mode 100644
index 000000000000..661cb989508a
--- /dev/null
+++ b/net/quic/Kconfig
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# QUIC configuration
+#
+config QUIC
+	tristate "QUIC encryption offload"
+	depends on INET
+	select CRYPTO
+	select CRYPTO_AES
+	select CRYPTO_GCM
+	help
+	Enable kernel support for QUIC crypto offload. Currently only TX
+	encryption offload is supported. The kernel will perform
+	copy-during-encryption.
+
+	If unsure, say N.
diff --git a/net/quic/Makefile b/net/quic/Makefile
new file mode 100644
index 000000000000..928239c4d08c
--- /dev/null
+++ b/net/quic/Makefile
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for the QUIC subsystem
+#
+
+obj-$(CONFIG_QUIC) += quic.o
+
+quic-y := quic_main.o
diff --git a/net/quic/quic_main.c b/net/quic/quic_main.c
new file mode 100644
index 000000000000..e738c8130a4f
--- /dev/null
+++ b/net/quic/quic_main.c
@@ -0,0 +1,1400 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <crypto/skcipher.h>
+#include <linux/bug.h>
+#include <linux/module.h>
+#include <linux/rhashtable.h>
+// Include header to use TLS constants for AEAD cipher.
+#include <net/tls.h>
+#include <net/quic.h>
+#include <net/udp.h>
+#include <uapi/linux/quic.h>
+
+static unsigned long af_init_done;
+static struct proto quic_v4_proto;
+static struct proto quic_v6_proto;
+static DEFINE_SPINLOCK(quic_proto_lock);
+
+static u32 quic_tx_connection_hash(const void *data, u32 len, u32 seed)
+{
+	return jhash(data, len, seed);
+}
+
+static u32 quic_tx_connection_hash_obj(const void *data, u32 len, u32 seed)
+{
+	const struct quic_connection_rhash *connhash = data;
+
+	return jhash(&connhash->crypto_ctx.conn_info.key,
+		     sizeof(struct quic_connection_info_key), seed);
+}
+
+static int quic_tx_connection_hash_cmp(struct rhashtable_compare_arg *arg,
+				       const void *ptr)
+{
+	const struct quic_connection_info_key *key = arg->key;
+	const struct quic_connection_rhash *x = ptr;
+
+	return !!memcmp(&x->crypto_ctx.conn_info.key,
+			key,
+			sizeof(struct quic_connection_info_key));
+}
+
+static const struct rhashtable_params quic_tx_connection_params = {
+	.key_len		= sizeof(struct quic_connection_info_key),
+	.head_offset		= offsetof(struct quic_connection_rhash, node),
+	.hashfn			= quic_tx_connection_hash,
+	.obj_hashfn		= quic_tx_connection_hash_obj,
+	.obj_cmpfn		= quic_tx_connection_hash_cmp,
+	.automatic_shrinking	= true,
+};
+
+static inline size_t quic_crypto_key_size(u16 cipher_type)
+{
+	switch (cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		return TLS_CIPHER_AES_GCM_128_KEY_SIZE;
+	case TLS_CIPHER_AES_GCM_256:
+		return TLS_CIPHER_AES_GCM_256_KEY_SIZE;
+	case TLS_CIPHER_AES_CCM_128:
+		return TLS_CIPHER_AES_CCM_128_KEY_SIZE;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		return TLS_CIPHER_CHACHA20_POLY1305_KEY_SIZE;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return 0;
+}
+
+static inline size_t quic_crypto_tag_size(u16 cipher_type)
+{
+	switch (cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		return TLS_CIPHER_AES_GCM_128_TAG_SIZE;
+	case TLS_CIPHER_AES_GCM_256:
+		return TLS_CIPHER_AES_GCM_256_TAG_SIZE;
+	case TLS_CIPHER_AES_CCM_128:
+		return TLS_CIPHER_AES_CCM_128_TAG_SIZE;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		return TLS_CIPHER_CHACHA20_POLY1305_TAG_SIZE;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return 0;
+}
+
+static inline size_t quic_crypto_iv_size(u16 cipher_type)
+{
+	switch (cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		BUILD_BUG_ON(TLS_CIPHER_AES_GCM_128_IV_SIZE
+			     > QUIC_CIPHER_MAX_IV_SIZE);
+		return TLS_CIPHER_AES_GCM_128_IV_SIZE;
+	case TLS_CIPHER_AES_GCM_256:
+		BUILD_BUG_ON(TLS_CIPHER_AES_GCM_256_IV_SIZE
+			     > QUIC_CIPHER_MAX_IV_SIZE);
+		return TLS_CIPHER_AES_GCM_256_IV_SIZE;
+	case TLS_CIPHER_AES_CCM_128:
+		BUILD_BUG_ON(TLS_CIPHER_AES_CCM_128_IV_SIZE
+			     > QUIC_CIPHER_MAX_IV_SIZE);
+		return TLS_CIPHER_AES_CCM_128_IV_SIZE;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		BUILD_BUG_ON(TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE
+			     > QUIC_CIPHER_MAX_IV_SIZE);
+		return TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return 0;
+}
+
+static inline size_t quic_crypto_nonce_size(u16 cipher_type)
+{
+	switch (cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		BUILD_BUG_ON(TLS_CIPHER_AES_GCM_128_IV_SIZE +
+			     TLS_CIPHER_AES_GCM_128_SALT_SIZE >
+			     QUIC_CIPHER_MAX_NONCE_SIZE);
+		return TLS_CIPHER_AES_GCM_128_IV_SIZE +
+		       TLS_CIPHER_AES_GCM_128_SALT_SIZE;
+	case TLS_CIPHER_AES_GCM_256:
+		BUILD_BUG_ON(TLS_CIPHER_AES_GCM_256_IV_SIZE +
+			     TLS_CIPHER_AES_GCM_256_SALT_SIZE >
+			     QUIC_CIPHER_MAX_NONCE_SIZE);
+		return TLS_CIPHER_AES_GCM_256_IV_SIZE +
+		       TLS_CIPHER_AES_GCM_256_SALT_SIZE;
+	case TLS_CIPHER_AES_CCM_128:
+		BUILD_BUG_ON(TLS_CIPHER_AES_CCM_128_IV_SIZE +
+			     TLS_CIPHER_AES_CCM_128_SALT_SIZE >
+			     QUIC_CIPHER_MAX_NONCE_SIZE);
+		return TLS_CIPHER_AES_CCM_128_IV_SIZE +
+		       TLS_CIPHER_AES_CCM_128_SALT_SIZE;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		BUILD_BUG_ON(TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE +
+			     TLS_CIPHER_CHACHA20_POLY1305_SALT_SIZE >
+			     QUIC_CIPHER_MAX_NONCE_SIZE);
+		return TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE +
+		       TLS_CIPHER_CHACHA20_POLY1305_SALT_SIZE;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return 0;
+}
+
+static inline
+u8 *quic_payload_iv(struct quic_internal_crypto_context *crypto_ctx)
+{
+	switch (crypto_ctx->conn_info.cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		return crypto_ctx->conn_info.aes_gcm_128.payload_iv;
+	case TLS_CIPHER_AES_GCM_256:
+		return crypto_ctx->conn_info.aes_gcm_256.payload_iv;
+	case TLS_CIPHER_AES_CCM_128:
+		return crypto_ctx->conn_info.aes_ccm_128.payload_iv;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		return crypto_ctx->conn_info.chacha20_poly1305.payload_iv;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return 0;
+}
+
+static int
+quic_config_header_crypto(struct quic_internal_crypto_context *crypto_ctx)
+{
+	struct crypto_skcipher *tfm;
+	char *header_cipher;
+	int rc = 0;
+	char *key;
+
+	switch (crypto_ctx->conn_info.cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		header_cipher = "ecb(aes)";
+		key = crypto_ctx->conn_info.aes_gcm_128.header_key;
+		break;
+	case TLS_CIPHER_AES_GCM_256:
+		header_cipher = "ecb(aes)";
+		key = crypto_ctx->conn_info.aes_gcm_256.header_key;
+		break;
+	case TLS_CIPHER_AES_CCM_128:
+		header_cipher = "ecb(aes)";
+		key = crypto_ctx->conn_info.aes_ccm_128.header_key;
+		break;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		header_cipher = "chacha20";
+		key = crypto_ctx->conn_info.chacha20_poly1305.header_key;
+		break;
+	default:
+		rc = -EINVAL;
+		goto out;
+	}
+
+	tfm = crypto_alloc_skcipher(header_cipher, 0, 0);
+	if (IS_ERR(tfm)) {
+		rc = PTR_ERR(tfm);
+		goto out;
+	}
+
+	rc = crypto_skcipher_setkey(tfm, key,
+				    quic_crypto_key_size(crypto_ctx->conn_info
+							 .cipher_type));
+	if (rc) {
+		crypto_free_skcipher(tfm);
+		goto out;
+	}
+
+	crypto_ctx->header_tfm = tfm;
+
+out:
+	return rc;
+}
+
+static int
+quic_config_packet_crypto(struct quic_internal_crypto_context *crypto_ctx)
+{
+	struct crypto_aead *aead;
+	char *cipher_name;
+	int rc = 0;
+	char *key;
+
+	switch (crypto_ctx->conn_info.cipher_type) {
+	case TLS_CIPHER_AES_GCM_128: {
+		key = crypto_ctx->conn_info.aes_gcm_128.payload_key;
+		cipher_name = "gcm(aes)";
+		break;
+	}
+	case TLS_CIPHER_AES_GCM_256: {
+		key = crypto_ctx->conn_info.aes_gcm_256.payload_key;
+		cipher_name = "gcm(aes)";
+		break;
+	}
+	case TLS_CIPHER_AES_CCM_128: {
+		key = crypto_ctx->conn_info.aes_ccm_128.payload_key;
+		cipher_name = "ccm(aes)";
+		break;
+	}
+	case TLS_CIPHER_CHACHA20_POLY1305: {
+		key = crypto_ctx->conn_info.chacha20_poly1305.payload_key;
+		cipher_name = "rfc7539(chacha20,poly1305)";
+		break;
+	}
+	default:
+		rc = -EINVAL;
+		goto out;
+	}
+
+	aead = crypto_alloc_aead(cipher_name, 0, 0);
+	if (IS_ERR(aead)) {
+		rc = PTR_ERR(aead);
+		goto out;
+	}
+
+	rc = crypto_aead_setkey(aead, key,
+				quic_crypto_key_size(crypto_ctx->conn_info
+						     .cipher_type));
+	if (rc)
+		goto free_aead;
+
+	rc = crypto_aead_setauthsize(aead,
+				     quic_crypto_tag_size(crypto_ctx->conn_info
+							  .cipher_type));
+	if (rc)
+		goto free_aead;
+
+	crypto_ctx->packet_aead = aead;
+	goto out;
+
+free_aead:
+	crypto_free_aead(aead);
+
+out:
+	return rc;
+}
+
+static inline struct quic_context *quic_get_ctx(struct sock *sk)
+{
+	struct inet_sock *inet = inet_sk(sk);
+
+	return (__force void *)rcu_access_pointer(inet->ulp_data);
+}
+
+static void quic_free_cipher_page(struct page *page)
+{
+	__free_pages(page, QUIC_MAX_CIPHER_PAGES_ORDER);
+}
+
+static struct quic_context *quic_ctx_create(void)
+{
+	struct quic_context *ctx;
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return NULL;
+
+	mutex_init(&ctx->sendmsg_mux);
+	ctx->cipher_page = alloc_pages(GFP_KERNEL, QUIC_MAX_CIPHER_PAGES_ORDER);
+	if (!ctx->cipher_page)
+		goto out_err;
+
+	if (rhashtable_init(&ctx->tx_connections,
+			    &quic_tx_connection_params) < 0) {
+		quic_free_cipher_page(ctx->cipher_page);
+		goto out_err;
+	}
+
+	return ctx;
+
+out_err:
+	kfree(ctx);
+	return NULL;
+}
+
+static int quic_getsockopt(struct sock *sk, int level, int optname,
+			   char __user *optval, int __user *optlen)
+{
+	struct quic_context *ctx = quic_get_ctx(sk);
+
+	return ctx->sk_proto->getsockopt(sk, level, optname, optval, optlen);
+}
+
+static int do_quic_conn_add_tx(struct sock *sk, sockptr_t optval,
+			       unsigned int optlen)
+{
+	struct quic_internal_crypto_context *crypto_ctx;
+	struct quic_context *ctx = quic_get_ctx(sk);
+	struct quic_connection_rhash *connhash;
+	int rc = 0;
+
+	if (sockptr_is_null(optval))
+		return -EINVAL;
+
+	if (optlen != sizeof(struct quic_connection_info))
+		return -EINVAL;
+
+	connhash = kzalloc(sizeof(*connhash), GFP_KERNEL);
+	if (!connhash)
+		return -EFAULT;
+
+	crypto_ctx = &connhash->crypto_ctx;
+	rc = copy_from_sockptr(&crypto_ctx->conn_info, optval,
+			       sizeof(crypto_ctx->conn_info));
+	if (rc) {
+		rc = -EFAULT;
+		goto err_crypto_info;
+	}
+
+	// create all TLS materials for packet and header decryption
+	rc = quic_config_header_crypto(crypto_ctx);
+	if (rc)
+		goto err_crypto_info;
+
+	rc = quic_config_packet_crypto(crypto_ctx);
+	if (rc)
+		goto err_free_skcipher;
+
+	// insert crypto data into hash per connection ID
+	rc = rhashtable_insert_fast(&ctx->tx_connections, &connhash->node,
+				    quic_tx_connection_params);
+	if (rc < 0)
+		goto err_free_ciphers;
+
+	return 0;
+
+err_free_ciphers:
+	crypto_free_aead(crypto_ctx->packet_aead);
+
+err_free_skcipher:
+	crypto_free_skcipher(crypto_ctx->header_tfm);
+
+err_crypto_info:
+	// wipeout all crypto materials;
+	memzero_explicit(&connhash->crypto_ctx, sizeof(connhash->crypto_ctx));
+	kfree(connhash);
+	return rc;
+}
+
+static int do_quic_conn_del_tx(struct sock *sk, sockptr_t optval,
+			       unsigned int optlen)
+{
+	struct quic_internal_crypto_context *crypto_ctx;
+	struct quic_context *ctx = quic_get_ctx(sk);
+	struct quic_connection_rhash *connhash;
+	struct quic_connection_info conn_info;
+
+	if (sockptr_is_null(optval))
+		return -EINVAL;
+
+	if (optlen != sizeof(struct quic_connection_info))
+		return -EINVAL;
+
+	if (copy_from_sockptr(&conn_info, optval, optlen))
+		return -EFAULT;
+
+	connhash = rhashtable_lookup_fast(&ctx->tx_connections,
+					  &conn_info.key,
+					  quic_tx_connection_params);
+	if (!connhash)
+		return -EINVAL;
+
+	rhashtable_remove_fast(&ctx->tx_connections,
+			       &connhash->node,
+			       quic_tx_connection_params);
+
+
+	crypto_ctx = &connhash->crypto_ctx;
+
+	crypto_free_skcipher(crypto_ctx->header_tfm);
+	crypto_free_aead(crypto_ctx->packet_aead);
+	memzero_explicit(crypto_ctx, sizeof(*crypto_ctx));
+	kfree(connhash);
+
+	return 0;
+}
+
+static int do_quic_setsockopt(struct sock *sk, int optname, sockptr_t optval,
+			      unsigned int optlen)
+{
+	int rc = 0;
+
+	switch (optname) {
+	case UDP_QUIC_ADD_TX_CONNECTION:
+		lock_sock(sk);
+		rc = do_quic_conn_add_tx(sk, optval, optlen);
+		release_sock(sk);
+		break;
+	case UDP_QUIC_DEL_TX_CONNECTION:
+		lock_sock(sk);
+		rc = do_quic_conn_del_tx(sk, optval, optlen);
+		release_sock(sk);
+		break;
+	default:
+		rc = -ENOPROTOOPT;
+		break;
+	}
+
+	return rc;
+}
+
+static int quic_setsockopt(struct sock *sk, int level, int optname,
+			   sockptr_t optval, unsigned int optlen)
+{
+	struct quic_context *ctx;
+	struct proto *sk_proto;
+
+	rcu_read_lock();
+	ctx = quic_get_ctx(sk);
+	sk_proto = ctx->sk_proto;
+	rcu_read_unlock();
+
+	if (level == SOL_UDP &&
+	    (optname == UDP_QUIC_ADD_TX_CONNECTION ||
+	     optname == UDP_QUIC_DEL_TX_CONNECTION))
+		return do_quic_setsockopt(sk, optname, optval, optlen);
+
+	return sk_proto->setsockopt(sk, level, optname, optval, optlen);
+}
+
+static int
+quic_extract_ancillary_data(struct msghdr *msg,
+			    struct quic_tx_ancillary_data *ancillary_data,
+			    u16 *udp_pkt_size)
+{
+	struct cmsghdr *cmsg_hdr = 0;
+	void *ancillary_data_ptr = 0;
+
+	if (!msg->msg_controllen)
+		return -EINVAL;
+
+	for_each_cmsghdr(cmsg_hdr, msg) {
+		if (!CMSG_OK(msg, cmsg_hdr))
+			return -EINVAL;
+
+		if (cmsg_hdr->cmsg_level != IPPROTO_UDP)
+			continue;
+
+		if (cmsg_hdr->cmsg_type == UDP_QUIC_ENCRYPT) {
+			if (cmsg_hdr->cmsg_len !=
+			    CMSG_LEN(sizeof(struct quic_tx_ancillary_data)))
+				return -EINVAL;
+			memcpy((void *)ancillary_data, CMSG_DATA(cmsg_hdr),
+			       sizeof(struct quic_tx_ancillary_data));
+			ancillary_data_ptr = cmsg_hdr;
+		} else if (cmsg_hdr->cmsg_type == UDP_SEGMENT) {
+			if (cmsg_hdr->cmsg_len != CMSG_LEN(sizeof(u16)))
+				return -EINVAL;
+			memcpy((void *)udp_pkt_size, CMSG_DATA(cmsg_hdr),
+			       sizeof(u16));
+		}
+	}
+
+	if (!ancillary_data_ptr)
+		return -EINVAL;
+
+	return 0;
+}
+
+static int quic_sendmsg_validate(struct msghdr *msg)
+{
+	if (!iter_is_iovec(&msg->msg_iter))
+		return -EINVAL;
+
+	if (!msg->msg_controllen)
+		return -EINVAL;
+
+	return 0;
+}
+
+static struct quic_connection_rhash
+*quic_lookup_connection(struct quic_context *ctx,
+			u8 *conn_id,
+			struct quic_tx_ancillary_data *ancillary_data)
+{
+	struct quic_connection_info_key conn_key;
+
+	// Lookup connection information by the connection key.
+	memset(&conn_key, 0, sizeof(struct quic_connection_info_key));
+	// fill the connection id up to the max connection ID length
+	if (ancillary_data->conn_id_length > QUIC_MAX_CONNECTION_ID_SIZE)
+		return NULL;
+
+	conn_key.conn_id_length = ancillary_data->conn_id_length;
+	if (ancillary_data->conn_id_length)
+		memcpy(conn_key.conn_id,
+		       conn_id,
+		       ancillary_data->conn_id_length);
+	return rhashtable_lookup_fast(&ctx->tx_connections,
+				      &conn_key,
+				      quic_tx_connection_params);
+}
+
+static int quic_sg_capacity_from_msg(const size_t pkt_size,
+				     const off_t offset,
+				     const size_t length)
+{
+	size_t	pages = 0;
+	size_t	pkts = 0;
+
+	pages = DIV_ROUND_UP(offset + length, PAGE_SIZE);
+	pkts = DIV_ROUND_UP(length, pkt_size);
+	return pages + pkts + 1;
+}
+
+static void quic_put_plain_user_pages(struct page **pages, size_t nr_pages)
+{
+	int i;
+
+	for (i = 0; i < nr_pages; ++i)
+		if (i == 0 || pages[i] != pages[i - 1])
+			put_page(pages[i]);
+}
+
+static int quic_get_plain_user_pages(struct msghdr * const msg,
+				     struct page **pages,
+				     int *page_indices)
+{
+	size_t	nr_mapped = 0;
+	size_t	nr_pages = 0;
+	void	*data_addr;
+	void	*page_addr;
+	size_t	count = 0;
+	off_t	data_off;
+	int	ret = 0;
+	int	i;
+
+	for (i = 0; i < msg->msg_iter.nr_segs; ++i) {
+		data_addr = msg->msg_iter.iov[i].iov_base;
+		if (!i)
+			data_addr += msg->msg_iter.iov_offset;
+		page_addr =
+			(void *)((unsigned long)data_addr & PAGE_MASK);
+
+		data_off = (unsigned long)data_addr & ~PAGE_MASK;
+		nr_pages =
+			DIV_ROUND_UP(data_off + msg->msg_iter.iov[i].iov_len,
+				     PAGE_SIZE);
+		if (nr_mapped + nr_pages > QUIC_MAX_PLAIN_PAGES) {
+			quic_put_plain_user_pages(pages, nr_mapped);
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		count = get_user_pages((unsigned long)page_addr, nr_pages, 1,
+				       pages, NULL);
+		if (count < nr_pages) {
+			quic_put_plain_user_pages(pages, nr_mapped + count);
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		page_indices[i] = nr_mapped;
+		nr_mapped += count;
+		pages += count;
+	}
+	ret = nr_mapped;
+
+out:
+	return ret;
+}
+
+static int quic_sg_plain_from_mapped_msg(struct msghdr * const msg,
+					 struct page **plain_pages,
+					 void **iov_base_ptrs,
+					 void **iov_data_ptrs,
+					 const size_t plain_size,
+					 const size_t pkt_size,
+					 struct scatterlist * const sg_alloc,
+					 const size_t max_sg_alloc,
+					 struct scatterlist ** const sg_pkts,
+					 size_t *nr_plain_pages)
+{
+	int iov_page_indices[QUIC_MAX_IOVEC_SEGMENTS];
+	struct scatterlist *sg;
+	unsigned int pkt_i = 0;
+	ssize_t left_on_page;
+	size_t pkt_left;
+	unsigned int i;
+	size_t seg_len;
+	off_t page_ofs;
+	off_t seg_ofs;
+	int ret = 0;
+	int page_i;
+
+	if (msg->msg_iter.nr_segs >= QUIC_MAX_IOVEC_SEGMENTS) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ret = quic_get_plain_user_pages(msg, plain_pages, iov_page_indices);
+	if (ret < 0)
+		goto out;
+
+	*nr_plain_pages = ret;
+	sg = sg_alloc;
+	sg_pkts[pkt_i] = sg;
+	sg_unmark_end(sg);
+	pkt_left = pkt_size;
+	for (i = 0; i < msg->msg_iter.nr_segs; ++i) {
+		page_ofs = ((unsigned long)msg->msg_iter.iov[i].iov_base
+			   & (PAGE_SIZE - 1));
+		page_i = 0;
+		if (!i) {
+			page_ofs += msg->msg_iter.iov_offset;
+			while (page_ofs >= PAGE_SIZE) {
+				page_ofs -= PAGE_SIZE;
+				page_i++;
+			}
+		}
+
+		seg_len = msg->msg_iter.iov[i].iov_len;
+		page_i += iov_page_indices[i];
+
+		if (page_i >= QUIC_MAX_PLAIN_PAGES)
+			return -EFAULT;
+
+		seg_ofs = 0;
+		while (seg_ofs < seg_len) {
+			if (sg - sg_alloc > max_sg_alloc)
+				return -EFAULT;
+
+			sg_unmark_end(sg);
+			left_on_page = min_t(size_t, PAGE_SIZE - page_ofs,
+					     seg_len - seg_ofs);
+			if (left_on_page <= 0)
+				return -EFAULT;
+
+			if (left_on_page > pkt_left) {
+				sg_set_page(sg, plain_pages[page_i], pkt_left,
+					    page_ofs);
+				pkt_i++;
+				seg_ofs += pkt_left;
+				page_ofs += pkt_left;
+				sg_mark_end(sg);
+				sg++;
+				sg_pkts[pkt_i] = sg;
+				pkt_left = pkt_size;
+				continue;
+			}
+			sg_set_page(sg, plain_pages[page_i], left_on_page,
+				    page_ofs);
+			page_i++;
+			page_ofs = 0;
+			seg_ofs += left_on_page;
+			pkt_left -= left_on_page;
+			if (pkt_left == 0 ||
+			    (seg_ofs == seg_len &&
+			     i == msg->msg_iter.nr_segs - 1)) {
+				sg_mark_end(sg);
+				pkt_i++;
+				sg++;
+				sg_pkts[pkt_i] = sg;
+				pkt_left = pkt_size;
+			} else {
+				sg++;
+			}
+		}
+	}
+
+	if (pkt_left && pkt_left != pkt_size) {
+		pkt_i++;
+		sg_mark_end(sg);
+	}
+	ret = pkt_i;
+
+out:
+	return ret;
+}
+
+/* sg_alloc: allocated zeroed array of scatterlists
+ * cipher_page: preallocated compound page
+ */
+static int quic_sg_cipher_from_pkts(const size_t cipher_tag_size,
+				     const size_t plain_pkt_size,
+				     const size_t plain_size,
+				     struct page * const cipher_page,
+				     struct scatterlist * const sg_alloc,
+				     const size_t nr_sg_alloc,
+				     struct scatterlist ** const sg_cipher)
+{
+	const size_t cipher_pkt_size = plain_pkt_size + cipher_tag_size;
+	size_t pkts = DIV_ROUND_UP(plain_size, plain_pkt_size);
+	struct scatterlist *sg = sg_alloc;
+	int pkt_i;
+	void *ptr;
+
+	if (pkts > nr_sg_alloc)
+		return -EINVAL;
+
+	ptr = page_address(cipher_page);
+	for (pkt_i = 0; pkt_i < pkts;
+		++pkt_i, ptr += cipher_pkt_size, ++sg) {
+		sg_set_buf(sg, ptr, cipher_pkt_size);
+		sg_mark_end(sg);
+		sg_cipher[pkt_i] = sg;
+	}
+	return pkts;
+}
+
+/* fast copy from scatterlist to a buffer assuming that all pages are
+ * available in kernel memory.
+ */
+static int quic_sg_pcopy_to_buffer_kernel(struct scatterlist *sg,
+					  u8 *buffer,
+					  size_t bytes_to_copy,
+					  off_t offset_to_read)
+{
+	off_t sg_remain = sg->length;
+	size_t to_copy;
+
+	if (!bytes_to_copy)
+		return 0;
+
+	// skip to offset first
+	while (offset_to_read > 0) {
+		if (!sg_remain)
+			return -EINVAL;
+		if (offset_to_read < sg_remain) {
+			sg_remain -= offset_to_read;
+			break;
+		}
+		offset_to_read -= sg_remain;
+		sg = sg_next(sg);
+		if (!sg)
+			return -EINVAL;
+		sg_remain = sg->length;
+	}
+
+	// traverse sg list from offset to offset + bytes_to_copy
+	while (bytes_to_copy) {
+		to_copy = min_t(size_t, bytes_to_copy, sg_remain);
+		if (!to_copy)
+			return -EINVAL;
+		memcpy(buffer, sg_virt(sg) + (sg->length - sg_remain), to_copy);
+		buffer += to_copy;
+		bytes_to_copy -= to_copy;
+		if (bytes_to_copy) {
+			sg = sg_next(sg);
+			if (!sg)
+				return -EINVAL;
+			sg_remain = sg->length;
+		}
+	}
+
+	return 0;
+}
+
+static int quic_copy_header(struct scatterlist *sg_plain,
+			    u8 *buf, const size_t buf_len,
+			    const size_t conn_id_len)
+{
+	u8 *pkt = sg_virt(sg_plain);
+	size_t hdr_len;
+
+	hdr_len = 1 + conn_id_len + ((*pkt & 0x03) + 1);
+	if (hdr_len > QUIC_MAX_SHORT_HEADER_SIZE || hdr_len > buf_len)
+		return -EINVAL;
+
+	WARN_ON_ONCE(quic_sg_pcopy_to_buffer_kernel(sg_plain, buf, hdr_len, 0));
+	return hdr_len;
+}
+
+static u64 quic_unpack_pkt_num(struct quic_tx_ancillary_data * const control,
+			       const u8 * const hdr,
+			       const off_t payload_crypto_off)
+{
+	u64 truncated_pn = 0;
+	u64 candidate_pn;
+	u64 expected_pn;
+	u64 pn_hwin;
+	u64 pn_mask;
+	u64 pn_len;
+	u64 pn_win;
+	int i;
+
+	pn_len = (hdr[0] & 0x03) + 1;
+	expected_pn = control->next_pkt_num;
+
+	for (i = 1 + control->conn_id_length; i < payload_crypto_off; ++i) {
+		truncated_pn <<= 8;
+		truncated_pn |= hdr[i];
+	}
+
+	pn_win = 1ULL << (pn_len << 3);
+	pn_hwin = pn_win >> 1;
+	pn_mask = pn_win - 1;
+	candidate_pn = (expected_pn & ~pn_mask) | truncated_pn;
+
+	if (expected_pn > pn_hwin &&
+	    candidate_pn <= expected_pn - pn_hwin &&
+	    candidate_pn < (1ULL << 62) - pn_win)
+		return candidate_pn + pn_win;
+
+	if (candidate_pn > expected_pn + pn_hwin &&
+	    candidate_pn >= pn_win)
+		return candidate_pn - pn_win;
+
+	return candidate_pn;
+}
+
+static int
+quic_construct_header_prot_mask(struct quic_internal_crypto_context *crypto_ctx,
+				struct skcipher_request *hdr_mask_req,
+				struct scatterlist *sg_cipher_pkt,
+				off_t sample_offset,
+				u8 *hdr_mask)
+{
+	u8 *sample = sg_virt(sg_cipher_pkt) + sample_offset;
+	u8 hdr_ctr[sizeof(u32) + QUIC_CIPHER_MAX_IV_SIZE];
+	struct scatterlist sg_cipher_sample;
+	struct scatterlist sg_hdr_mask;
+	struct crypto_wait wait_header;
+	u32	counter;
+
+	BUILD_BUG_ON(QUIC_HDR_MASK_SIZE
+		     < sizeof(u32) + QUIC_CIPHER_MAX_IV_SIZE);
+
+	// cipher pages are continuous, get the pointer to the sg data directly,
+	// page is allocated in kernel
+	sg_init_one(&sg_cipher_sample, sample, QUIC_HDR_MASK_SIZE);
+	sg_init_one(&sg_hdr_mask, hdr_mask, QUIC_HDR_MASK_SIZE);
+	skcipher_request_set_callback(hdr_mask_req, 0, crypto_req_done,
+				      &wait_header);
+
+	if (crypto_ctx->conn_info.cipher_type == TLS_CIPHER_CHACHA20_POLY1305) {
+		counter = cpu_to_le32(*((u32 *)sample));
+		memset(hdr_ctr, 0, sizeof(hdr_ctr));
+		memcpy((u8 *)hdr_ctr, (u8 *)&counter, sizeof(u32));
+		memcpy((u8 *)hdr_ctr + sizeof(u32),
+		       (sample + sizeof(u32)),
+		       QUIC_CIPHER_MAX_IV_SIZE);
+		skcipher_request_set_crypt(hdr_mask_req, &sg_cipher_sample,
+					   &sg_hdr_mask, 5, hdr_ctr);
+	} else {
+		skcipher_request_set_crypt(hdr_mask_req, &sg_cipher_sample,
+					   &sg_hdr_mask, QUIC_HDR_MASK_SIZE,
+					   NULL);
+	}
+
+	return crypto_wait_req(crypto_skcipher_encrypt(hdr_mask_req),
+			       &wait_header);
+}
+
+static int quic_protect_header(struct quic_internal_crypto_context *crypto_ctx,
+			       struct quic_tx_ancillary_data *control,
+			       struct skcipher_request *hdr_mask_req,
+			       struct scatterlist *sg_cipher_pkt,
+			       int payload_crypto_off)
+{
+	u8 hdr_mask[QUIC_HDR_MASK_SIZE];
+	off_t quic_pkt_num_off;
+	u8 quic_pkt_num_len;
+	u8 *cipher_hdr;
+	int err;
+	int i;
+
+	quic_pkt_num_off = 1 + control->conn_id_length;
+	quic_pkt_num_len = payload_crypto_off - quic_pkt_num_off;
+
+	if (quic_pkt_num_len > 4)
+		return -EPERM;
+
+	err = quic_construct_header_prot_mask(crypto_ctx, hdr_mask_req,
+					      sg_cipher_pkt,
+					      payload_crypto_off +
+					      (4 - quic_pkt_num_len),
+					      hdr_mask);
+	if (unlikely(err))
+		return err;
+
+	cipher_hdr = sg_virt(sg_cipher_pkt);
+	// protect the public flags
+	cipher_hdr[0] ^= (hdr_mask[0] & 0x1f);
+
+	for (i = 0; i < quic_pkt_num_len; ++i)
+		cipher_hdr[quic_pkt_num_off + i] ^= hdr_mask[1 + i];
+
+	return 0;
+}
+
+static
+void quic_construct_ietf_nonce(u8 *nonce,
+			       struct quic_internal_crypto_context *crypto_ctx,
+			       u64 quic_pkt_num)
+{
+	u8 *iv = quic_payload_iv(crypto_ctx);
+	int i;
+
+	for (i = quic_crypto_nonce_size(crypto_ctx->conn_info.cipher_type) - 1;
+	     i >= 0 && quic_pkt_num;
+	     --i, quic_pkt_num >>= 8)
+		nonce[i] = iv[i] ^ (u8)quic_pkt_num;
+
+	for (; i >= 0; --i)
+		nonce[i] = iv[i];
+}
+
+ssize_t quic_sendpage(struct quic_context *ctx,
+		      struct sock *sk,
+		      struct msghdr *msg,
+		      const size_t cipher_size,
+		      struct page * const cipher_page)
+{
+	struct kvec iov;
+	ssize_t ret;
+
+	iov.iov_base = page_address(cipher_page);
+	iov.iov_len = cipher_size;
+	iov_iter_kvec(&msg->msg_iter, WRITE | ITER_KVEC, &iov, 1,
+		      cipher_size);
+	ret = security_socket_sendmsg(sk->sk_socket, msg, msg_data_left(msg));
+	if (ret)
+		return ret;
+
+	ret = ctx->sk_proto->sendmsg(sk, msg, msg_data_left(msg));
+	WARN_ON(ret == -EIOCBQUEUED);
+	return ret;
+}
+
+static int quic_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
+{
+	struct quic_internal_crypto_context *crypto_ctx = NULL;
+	struct scatterlist *sg_cipher_pkts[QUIC_MAX_GSO_FRAGS];
+	struct scatterlist *sg_plain_pkts[QUIC_MAX_GSO_FRAGS];
+	struct page *plain_pages[QUIC_MAX_PLAIN_PAGES];
+	void *plain_base_ptrs[QUIC_MAX_IOVEC_SEGMENTS];
+	void *plain_data_ptrs[QUIC_MAX_IOVEC_SEGMENTS];
+	struct msghdr msg_cipher = {
+		.msg_name = msg->msg_name,
+		.msg_namelen = msg->msg_namelen,
+		.msg_flags = msg->msg_flags,
+		.msg_control = msg->msg_control,
+		.msg_controllen = msg->msg_controllen,
+	};
+	struct quic_connection_rhash *connhash = NULL;
+	struct quic_connection_info *conn_info = NULL;
+	struct quic_context *ctx = quic_get_ctx(sk);
+	u8 hdr_buf[QUIC_MAX_SHORT_HEADER_SIZE];
+	struct skcipher_request *hdr_mask_req;
+	struct quic_tx_ancillary_data control;
+	u8 nonce[QUIC_CIPHER_MAX_NONCE_SIZE];
+	struct	aead_request *aead_req = 0;
+	struct scatterlist *sg_cipher = 0;
+	struct udp_sock *up = udp_sk(sk);
+	struct scatterlist *sg_plain = 0;
+	u16 gso_pkt_size = up->gso_size;
+	size_t last_plain_pkt_size = 0;
+	off_t	payload_crypto_offset;
+	struct crypto_aead *tfm = 0;
+	size_t nr_plain_pages = 0;
+	struct crypto_wait waiter;
+	size_t nr_sg_cipher_pkts;
+	size_t nr_sg_plain_pkts;
+	ssize_t hdr_buf_len = 0;
+	size_t nr_sg_alloc = 0;
+	size_t plain_pkt_size;
+	u64	full_pkt_num;
+	size_t cipher_size;
+	size_t plain_size;
+	size_t pkt_size;
+	size_t tag_size;
+	int ret = 0;
+	int pkt_i;
+	int err;
+
+	memset(&hdr_buf[0], 0, QUIC_MAX_SHORT_HEADER_SIZE);
+	hdr_buf_len = copy_from_iter(hdr_buf, QUIC_MAX_SHORT_HEADER_SIZE,
+				     &msg->msg_iter);
+	if (hdr_buf_len <= 0) {
+		ret = -EINVAL;
+		goto out;
+	}
+	iov_iter_revert(&msg->msg_iter, hdr_buf_len);
+
+	ctx = quic_get_ctx(sk);
+
+	// Bypass for anything that is guaranteed not QUIC.
+	plain_size = len;
+
+	if (plain_size < 2)
+		return ctx->sk_proto->sendmsg(sk, msg, len);
+
+	// Bypass for other than short header.
+	if ((hdr_buf[0] & 0xc0) != 0x40)
+		return ctx->sk_proto->sendmsg(sk, msg, len);
+
+	// Crypto adds a tag after the packet. Corking a payload would produce
+	// a crypto tag after each portion. Use GSO instead.
+	if ((msg->msg_flags & MSG_MORE) || up->pending) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = quic_sendmsg_validate(msg);
+	if (ret)
+		goto out;
+
+	ret = quic_extract_ancillary_data(msg, &control, &gso_pkt_size);
+	if (ret)
+		goto out;
+
+	// Reserved bits with ancillary data present are an error.
+	if (control.flags & ~QUIC_ANCILLARY_FLAGS) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	// Bypass offload on request. First packet bypass applies to all
+	// packets in the GSO pack.
+	if (control.flags & QUIC_BYPASS_ENCRYPTION)
+		return ctx->sk_proto->sendmsg(sk, msg, len);
+
+	if (hdr_buf_len < 1 + control.conn_id_length) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	// Fetch the flow
+	connhash = quic_lookup_connection(ctx, &hdr_buf[1], &control);
+	if (!connhash) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	crypto_ctx = &connhash->crypto_ctx;
+	conn_info = &crypto_ctx->conn_info;
+
+	tag_size = quic_crypto_tag_size(crypto_ctx->conn_info.cipher_type);
+
+	// For GSO, use the GSO size minus cipher tag size as the packet size;
+	// for non-GSO, use the size of the whole plaintext.
+	// Reduce the packet size by tag size to keep the original packet size
+	// for the rest of the UDP path in the stack.
+	if (!gso_pkt_size) {
+		plain_pkt_size = plain_size;
+	} else {
+		if (gso_pkt_size < tag_size)
+			goto out;
+
+		plain_pkt_size = gso_pkt_size - tag_size;
+	}
+
+	// Build scatterlist from the input data, split by GSO minus the
+	// crypto tag size.
+	nr_sg_alloc = quic_sg_capacity_from_msg(plain_pkt_size,
+						msg->msg_iter.iov_offset,
+						plain_size);
+	if ((nr_sg_alloc * 2) >= QUIC_MAX_SG_ALLOC_ELEMENTS) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	sg_plain = ctx->sg_alloc;
+	sg_cipher = sg_plain + nr_sg_alloc;
+
+	ret = quic_sg_plain_from_mapped_msg(msg, plain_pages,
+					    plain_base_ptrs,
+					    plain_data_ptrs, plain_size,
+					    plain_pkt_size, sg_plain,
+					    nr_sg_alloc, sg_plain_pkts,
+					    &nr_plain_pages);
+
+	if (ret < 0)
+		goto out;
+
+	nr_sg_plain_pkts = ret;
+	last_plain_pkt_size = plain_size % plain_pkt_size;
+	if (!last_plain_pkt_size)
+		last_plain_pkt_size = plain_pkt_size;
+
+	// Build scatterlist for the ciphertext, split by GSO.
+	cipher_size = plain_size + nr_sg_plain_pkts * tag_size;
+
+	if (DIV_ROUND_UP(cipher_size, PAGE_SIZE)
+	    >= (1 << QUIC_MAX_CIPHER_PAGES_ORDER)) {
+		ret = -ENOMEM;
+		goto out_put_pages;
+	}
+
+	ret = quic_sg_cipher_from_pkts(tag_size, plain_pkt_size, plain_size,
+				       ctx->cipher_page, sg_cipher, nr_sg_alloc,
+				       sg_cipher_pkts);
+	if (ret < 0)
+		goto out_put_pages;
+
+	nr_sg_cipher_pkts = ret;
+
+	if (nr_sg_plain_pkts != nr_sg_cipher_pkts) {
+		ret = -EPERM;
+		goto out_put_pages;
+	}
+
+	// Encrypt and protect header for each packet individually.
+	tfm = crypto_ctx->packet_aead;
+	crypto_aead_clear_flags(tfm, ~0);
+	aead_req = aead_request_alloc(tfm, GFP_KERNEL);
+	if (!aead_req) {
+		aead_request_free(aead_req);
+		ret = -ENOMEM;
+		goto out_put_pages;
+	}
+
+	hdr_mask_req = skcipher_request_alloc(crypto_ctx->header_tfm,
+					      GFP_KERNEL);
+	if (!hdr_mask_req) {
+		aead_request_free(aead_req);
+		ret = -ENOMEM;
+		goto out_put_pages;
+	}
+
+	for (pkt_i = 0; pkt_i < nr_sg_plain_pkts; ++pkt_i) {
+		payload_crypto_offset =
+			quic_copy_header(sg_plain_pkts[pkt_i],
+					 hdr_buf,
+					 sizeof(hdr_buf),
+					 control.conn_id_length);
+
+		full_pkt_num = quic_unpack_pkt_num(&control, hdr_buf,
+						   payload_crypto_offset);
+
+		pkt_size = (pkt_i + 1 < nr_sg_plain_pkts
+				? plain_pkt_size
+				: last_plain_pkt_size)
+			    - payload_crypto_offset;
+		if (pkt_size < 0) {
+			aead_request_free(aead_req);
+			skcipher_request_free(hdr_mask_req);
+			ret = -EINVAL;
+			goto out_put_pages;
+		}
+
+		/* Construct nonce and initialize request */
+		quic_construct_ietf_nonce(nonce, crypto_ctx, full_pkt_num);
+
+		/* Encrypt the body */
+		aead_request_set_callback(aead_req,
+					  CRYPTO_TFM_REQ_MAY_BACKLOG
+					  | CRYPTO_TFM_REQ_MAY_SLEEP,
+					  crypto_req_done, &waiter);
+		aead_request_set_crypt(aead_req, sg_plain_pkts[pkt_i],
+				       sg_cipher_pkts[pkt_i],
+				       pkt_size,
+				       nonce);
+		aead_request_set_ad(aead_req, payload_crypto_offset);
+		err = crypto_wait_req(crypto_aead_encrypt(aead_req), &waiter);
+		if (unlikely(err)) {
+			ret = err;
+			aead_request_free(aead_req);
+			skcipher_request_free(hdr_mask_req);
+			goto out_put_pages;
+		}
+
+		/* Protect the header */
+		memcpy(sg_virt(sg_cipher_pkts[pkt_i]), hdr_buf,
+		       payload_crypto_offset);
+
+		err = quic_protect_header(crypto_ctx, &control,
+					  hdr_mask_req,
+					  sg_cipher_pkts[pkt_i],
+					  payload_crypto_offset);
+		if (unlikely(err)) {
+			ret = err;
+			aead_request_free(aead_req);
+			skcipher_request_free(hdr_mask_req);
+			goto out_put_pages;
+		}
+	}
+	skcipher_request_free(hdr_mask_req);
+	aead_request_free(aead_req);
+
+	// Deliver to the next layer.
+	if (ctx->sk_proto->sendpage) {
+		msg_cipher.msg_flags |= MSG_MORE;
+		err = ctx->sk_proto->sendmsg(sk, &msg_cipher, 0);
+		if (err < 0) {
+			ret = err;
+			goto out_put_pages;
+		}
+
+		err = ctx->sk_proto->sendpage(sk, ctx->cipher_page, 0,
+					      cipher_size, 0);
+		if (err < 0) {
+			ret = err;
+			goto out_put_pages;
+		}
+		if (err != cipher_size) {
+			ret = -EINVAL;
+			goto out_put_pages;
+		}
+		ret = plain_size;
+	} else {
+		ret = quic_sendpage(ctx, sk, &msg_cipher, cipher_size,
+				    ctx->cipher_page);
+		// indicate full plaintext transmission to the caller.
+		if (ret > 0)
+			ret = plain_size;
+	}
+
+
+out_put_pages:
+	quic_put_plain_user_pages(plain_pages, nr_plain_pages);
+
+out:
+	return ret;
+}
+
+static int quic_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t len)
+{
+	struct quic_context *ctx;
+	int ret;
+
+	rcu_read_lock();
+	ctx = quic_get_ctx(sk);
+	rcu_read_unlock();
+	if (!ctx)
+		return -EINVAL;
+
+	mutex_lock(&ctx->sendmsg_mux);
+	ret = quic_sendmsg(sk, msg, len);
+	mutex_unlock(&ctx->sendmsg_mux);
+	return ret;
+}
+
+static void quic_release_resources(struct sock *sk)
+{
+	struct quic_internal_crypto_context *crypto_ctx;
+	struct quic_connection_rhash *connhash;
+	struct inet_sock *inet = inet_sk(sk);
+	struct rhashtable_iter hti;
+	struct quic_context *ctx;
+	struct proto *sk_proto;
+
+	rcu_read_lock();
+	ctx = quic_get_ctx(sk);
+	if (!ctx) {
+		rcu_read_unlock();
+		return;
+	}
+
+	sk_proto = ctx->sk_proto;
+
+	rhashtable_walk_enter(&ctx->tx_connections, &hti);
+	rhashtable_walk_start(&hti);
+
+	while ((connhash = rhashtable_walk_next(&hti))) {
+		if (IS_ERR(connhash)) {
+			if (PTR_ERR(connhash) == -EAGAIN)
+				continue;
+			break;
+		}
+
+		crypto_ctx = &connhash->crypto_ctx;
+		crypto_free_aead(crypto_ctx->packet_aead);
+		crypto_free_skcipher(crypto_ctx->header_tfm);
+		memzero_explicit(crypto_ctx, sizeof(*crypto_ctx));
+	}
+
+	rhashtable_walk_stop(&hti);
+	rhashtable_walk_exit(&hti);
+	rhashtable_destroy(&ctx->tx_connections);
+
+	if (ctx->cipher_page) {
+		quic_free_cipher_page(ctx->cipher_page);
+		ctx->cipher_page = NULL;
+	}
+
+	rcu_read_unlock();
+
+	write_lock_bh(&sk->sk_callback_lock);
+	rcu_assign_pointer(inet->ulp_data, NULL);
+	WRITE_ONCE(sk->sk_prot, sk_proto);
+	write_unlock_bh(&sk->sk_callback_lock);
+
+	kfree_rcu(ctx, rcu);
+}
+
+static void
+quic_prep_protos(unsigned int af, struct proto *proto, const struct proto *base)
+{
+	if (likely(test_bit(af, &af_init_done)))
+		return;
+
+	spin_lock(&quic_proto_lock);
+	if (test_bit(af, &af_init_done))
+		goto out_unlock;
+
+	*proto			= *base;
+	proto->setsockopt	= quic_setsockopt;
+	proto->getsockopt	= quic_getsockopt;
+	proto->sendmsg		= quic_sendmsg_locked;
+
+	smp_mb__before_atomic(); /* proto calls should be visible first */
+	set_bit(af, &af_init_done);
+
+out_unlock:
+	spin_unlock(&quic_proto_lock);
+}
+
+static void quic_update_proto(struct sock *sk, struct quic_context *ctx)
+{
+	struct proto *udp_proto, *quic_proto;
+	struct inet_sock *inet = inet_sk(sk);
+
+	udp_proto = READ_ONCE(sk->sk_prot);
+	ctx->sk_proto = udp_proto;
+	quic_proto = sk->sk_family == AF_INET ? &quic_v4_proto : &quic_v6_proto;
+
+	quic_prep_protos(sk->sk_family, quic_proto, udp_proto);
+
+	write_lock_bh(&sk->sk_callback_lock);
+	rcu_assign_pointer(inet->ulp_data, ctx);
+	WRITE_ONCE(sk->sk_prot, quic_proto);
+	write_unlock_bh(&sk->sk_callback_lock);
+}
+
+static int quic_init(struct sock *sk)
+{
+	struct quic_context *ctx;
+
+	ctx = quic_ctx_create();
+	if (!ctx)
+		return -ENOMEM;
+
+	quic_update_proto(sk, ctx);
+
+	return 0;
+}
+
+static void quic_release(struct sock *sk)
+{
+	lock_sock(sk);
+	quic_release_resources(sk);
+	release_sock(sk);
+}
+
+static struct udp_ulp_ops quic_ulp_ops __read_mostly = {
+	.name		= "quic-crypto",
+	.owner		= THIS_MODULE,
+	.init		= quic_init,
+	.release	= quic_release,
+};
+
+static int __init quic_register(void)
+{
+	udp_register_ulp(&quic_ulp_ops);
+	return 0;
+}
+
+static void __exit quic_unregister(void)
+{
+	udp_unregister_ulp(&quic_ulp_ops);
+}
+
+module_init(quic_register);
+module_exit(quic_unregister);
+
+MODULE_DESCRIPTION("QUIC crypto ULP");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_UDP_ULP("quic-crypto");
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC net-next v2 5/6] Add flow counters and Tx processing error counter
  2022-08-06  0:11 ` [RFC net-next v2 0/6] net: support QUIC crypto Adel Abouchaev
                     ` (3 preceding siblings ...)
  2022-08-06  0:11   ` [RFC net-next v2 4/6] Implement QUIC offload functions Adel Abouchaev
@ 2022-08-06  0:11   ` Adel Abouchaev
  2022-08-06  0:11   ` [RFC net-next v2 6/6] Add self tests for ULP operations, flow setup and crypto tests Adel Abouchaev
  5 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-06  0:11 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

Added flow counters. Total flow counter is accumulative, the current shows the
number of flows currently in flight, the error counters is accumulating the
number of errors during Tx processing.

Signed-off-by: Adel Abouchaev <adelab@fb.com>
---
 include/net/netns/mib.h   |  3 +++
 include/net/quic.h        | 10 +++++++++
 include/net/snmp.h        |  6 +++++
 include/uapi/linux/snmp.h | 11 ++++++++++
 net/quic/Makefile         |  2 +-
 net/quic/quic_main.c      | 46 +++++++++++++++++++++++++++++++++++++++
 net/quic/quic_proc.c      | 45 ++++++++++++++++++++++++++++++++++++++
 7 files changed, 122 insertions(+), 1 deletion(-)
 create mode 100644 net/quic/quic_proc.c

diff --git a/include/net/netns/mib.h b/include/net/netns/mib.h
index 7e373664b1e7..dcbba3d1ceec 100644
--- a/include/net/netns/mib.h
+++ b/include/net/netns/mib.h
@@ -24,6 +24,9 @@ struct netns_mib {
 #if IS_ENABLED(CONFIG_TLS)
 	DEFINE_SNMP_STAT(struct linux_tls_mib, tls_statistics);
 #endif
+#if IS_ENABLED(CONFIG_QUIC)
+	DEFINE_SNMP_STAT(struct linux_quic_mib, quic_statistics);
+#endif
 #ifdef CONFIG_MPTCP
 	DEFINE_SNMP_STAT(struct mptcp_mib, mptcp_statistics);
 #endif
diff --git a/include/net/quic.h b/include/net/quic.h
index 15e04ea08c53..b6327f3b7632 100644
--- a/include/net/quic.h
+++ b/include/net/quic.h
@@ -25,6 +25,16 @@
 #define QUIC_MAX_PLAIN_PAGES		16
 #define QUIC_MAX_CIPHER_PAGES_ORDER	4
 
+#define __QUIC_INC_STATS(net, field)				\
+	__SNMP_INC_STATS((net)->mib.quic_statistics, field)
+#define QUIC_INC_STATS(net, field)				\
+	SNMP_INC_STATS((net)->mib.quic_statistics, field)
+#define QUIC_DEC_STATS(net, field)				\
+	SNMP_DEC_STATS((net)->mib.quic_statistics, field)
+
+int __net_init quic_proc_init(struct net *net);
+void __net_exit quic_proc_fini(struct net *net);
+
 struct quic_internal_crypto_context {
 	struct quic_connection_info	conn_info;
 	struct crypto_skcipher		*header_tfm;
diff --git a/include/net/snmp.h b/include/net/snmp.h
index 468a67836e2f..f94680a3e9e8 100644
--- a/include/net/snmp.h
+++ b/include/net/snmp.h
@@ -117,6 +117,12 @@ struct linux_tls_mib {
 	unsigned long	mibs[LINUX_MIB_TLSMAX];
 };
 
+/* Linux QUIC */
+#define LINUX_MIB_QUICMAX	__LINUX_MIB_QUICMAX
+struct linux_quic_mib {
+	unsigned long	mibs[LINUX_MIB_QUICMAX];
+};
+
 #define DEFINE_SNMP_STAT(type, name)	\
 	__typeof__(type) __percpu *name
 #define DEFINE_SNMP_STAT_ATOMIC(type, name)	\
diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index 4d7470036a8b..7bb2768b528a 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -349,4 +349,15 @@ enum
 	__LINUX_MIB_TLSMAX
 };
 
+/* linux QUIC mib definitions */
+enum
+{
+	LINUX_MIB_QUICNUM = 0,
+	LINUX_MIB_QUICCURRTXSW,			/* QuicCurrTxSw */
+	LINUX_MIB_QUICTXSW,			/* QuicTxSw */
+	LINUX_MIB_QUICTXSWERROR,		/* QuicTxSwError */
+	__LINUX_MIB_QUICMAX
+};
+
+
 #endif	/* _LINUX_SNMP_H */
diff --git a/net/quic/Makefile b/net/quic/Makefile
index 928239c4d08c..a885cd8bc4e0 100644
--- a/net/quic/Makefile
+++ b/net/quic/Makefile
@@ -5,4 +5,4 @@
 
 obj-$(CONFIG_QUIC) += quic.o
 
-quic-y := quic_main.o
+quic-y := quic_main.o quic_proc.o
diff --git a/net/quic/quic_main.c b/net/quic/quic_main.c
index e738c8130a4f..eb0fdeabd3c4 100644
--- a/net/quic/quic_main.c
+++ b/net/quic/quic_main.c
@@ -362,6 +362,8 @@ static int do_quic_conn_add_tx(struct sock *sk, sockptr_t optval,
 	if (rc < 0)
 		goto err_free_ciphers;
 
+	QUIC_INC_STATS(sock_net(sk), LINUX_MIB_QUICCURRTXSW);
+	QUIC_INC_STATS(sock_net(sk), LINUX_MIB_QUICTXSW);
 	return 0;
 
 err_free_ciphers:
@@ -411,6 +413,7 @@ static int do_quic_conn_del_tx(struct sock *sk, sockptr_t optval,
 	crypto_free_aead(crypto_ctx->packet_aead);
 	memzero_explicit(crypto_ctx, sizeof(*crypto_ctx));
 	kfree(connhash);
+	QUIC_DEC_STATS(sock_net(sk), LINUX_MIB_QUICCURRTXSW);
 
 	return 0;
 }
@@ -436,6 +439,9 @@ static int do_quic_setsockopt(struct sock *sk, int optname, sockptr_t optval,
 		break;
 	}
 
+	if (rc)
+		QUIC_INC_STATS(sock_net(sk), LINUX_MIB_QUICTXSWERROR);
+
 	return rc;
 }
 
@@ -1242,6 +1248,9 @@ static int quic_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	quic_put_plain_user_pages(plain_pages, nr_plain_pages);
 
 out:
+	if (unlikely(ret < 0))
+		QUIC_INC_STATS(sock_net(sk), LINUX_MIB_QUICTXSWERROR);
+
 	return ret;
 }
 
@@ -1374,6 +1383,36 @@ static void quic_release(struct sock *sk)
 	release_sock(sk);
 }
 
+static int __net_init quic_init_net(struct net *net)
+{
+	int err;
+
+	net->mib.quic_statistics = alloc_percpu(struct linux_quic_mib);
+	if (!net->mib.quic_statistics)
+		return -ENOMEM;
+
+	err = quic_proc_init(net);
+	if (err)
+		goto err_free_stats;
+
+	return 0;
+
+err_free_stats:
+	free_percpu(net->mib.quic_statistics);
+	return err;
+}
+
+static void __net_exit quic_exit_net(struct net *net)
+{
+	quic_proc_fini(net);
+	free_percpu(net->mib.quic_statistics);
+}
+
+static struct pernet_operations quic_proc_ops = {
+	.init = quic_init_net,
+	.exit = quic_exit_net,
+};
+
 static struct udp_ulp_ops quic_ulp_ops __read_mostly = {
 	.name		= "quic-crypto",
 	.owner		= THIS_MODULE,
@@ -1383,6 +1422,12 @@ static struct udp_ulp_ops quic_ulp_ops __read_mostly = {
 
 static int __init quic_register(void)
 {
+	int err;
+
+	err = register_pernet_subsys(&quic_proc_ops);
+	if (err)
+		return err;
+
 	udp_register_ulp(&quic_ulp_ops);
 	return 0;
 }
@@ -1390,6 +1435,7 @@ static int __init quic_register(void)
 static void __exit quic_unregister(void)
 {
 	udp_unregister_ulp(&quic_ulp_ops);
+	unregister_pernet_subsys(&quic_proc_ops);
 }
 
 module_init(quic_register);
diff --git a/net/quic/quic_proc.c b/net/quic/quic_proc.c
new file mode 100644
index 000000000000..cb4fe7a589b5
--- /dev/null
+++ b/net/quic/quic_proc.c
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+/* Copyright (C) 2019 Meta Platforms, Inc. */
+
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <net/snmp.h>
+#include <net/quic.h>
+
+#ifdef CONFIG_PROC_FS
+static const struct snmp_mib quic_mib_list[] = {
+	SNMP_MIB_ITEM("QuicCurrTxSw", LINUX_MIB_QUICCURRTXSW),
+	SNMP_MIB_ITEM("QuicTxSw", LINUX_MIB_QUICTXSW),
+	SNMP_MIB_ITEM("QuicTxSwError", LINUX_MIB_QUICTXSWERROR),
+	SNMP_MIB_SENTINEL
+};
+
+static int quic_statistics_seq_show(struct seq_file *seq, void *v)
+{
+	unsigned long buf[LINUX_MIB_QUICMAX] = {};
+	struct net *net = seq->private;
+	int i;
+
+	snmp_get_cpu_field_batch(buf, quic_mib_list, net->mib.quic_statistics);
+	for (i = 0; quic_mib_list[i].name; i++)
+		seq_printf(seq, "%-32s\t%lu\n", quic_mib_list[i].name, buf[i]);
+
+	return 0;
+}
+#endif
+
+int __net_init quic_proc_init(struct net *net)
+{
+#ifdef CONFIG_PROC_FS
+	if (!proc_create_net_single("quic_stat", 0444, net->proc_net,
+				    quic_statistics_seq_show, NULL))
+		return -ENOMEM;
+#endif /* CONFIG_PROC_FS */
+
+	return 0;
+}
+
+void __net_exit quic_proc_fini(struct net *net)
+{
+	remove_proc_entry("quic_stat", net->proc_net);
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC net-next v2 6/6] Add self tests for ULP operations, flow setup and crypto tests
  2022-08-06  0:11 ` [RFC net-next v2 0/6] net: support QUIC crypto Adel Abouchaev
                     ` (4 preceding siblings ...)
  2022-08-06  0:11   ` [RFC net-next v2 5/6] Add flow counters and Tx processing error counter Adel Abouchaev
@ 2022-08-06  0:11   ` Adel Abouchaev
  5 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-06  0:11 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

Add self tests for ULP operations, flow setup and crypto tests.

Signed-off-by: Adel Abouchaev <adelab@fb.com>

---

v2: Restored the test build. Changed the QUIC context reference variable
names for the keys and iv to match the uAPI.
---
 tools/testing/selftests/net/.gitignore |    3 +-
 tools/testing/selftests/net/Makefile   |    2 +-
 tools/testing/selftests/net/quic.c     | 1024 ++++++++++++++++++++++++
 tools/testing/selftests/net/quic.sh    |   45 ++
 4 files changed, 1072 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/net/quic.c
 create mode 100755 tools/testing/selftests/net/quic.sh

diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore
index 892306bdb47d..134b50f2ceb9 100644
--- a/tools/testing/selftests/net/.gitignore
+++ b/tools/testing/selftests/net/.gitignore
@@ -38,4 +38,5 @@ ioam6_parser
 toeplitz
 tun
 cmsg_sender
-unix_connect
\ No newline at end of file
+unix_connect
+quic
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index e2dfef8b78a7..034495ce1941 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -57,7 +57,7 @@ TEST_GEN_FILES += ipsec
 TEST_GEN_FILES += ioam6_parser
 TEST_GEN_FILES += gro
 TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
-TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls tun
+TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls tun quic
 TEST_GEN_FILES += toeplitz
 TEST_GEN_FILES += cmsg_sender
 TEST_GEN_FILES += stress_reuseport_listen
diff --git a/tools/testing/selftests/net/quic.c b/tools/testing/selftests/net/quic.c
new file mode 100644
index 000000000000..8e746a083140
--- /dev/null
+++ b/tools/testing/selftests/net/quic.c
@@ -0,0 +1,1024 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include <arpa/inet.h>
+#include <errno.h>
+#include <error.h>
+#include <fcntl.h>
+#include <poll.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <linux/limits.h>
+#include <linux/quic.h>
+#include <linux/socket.h>
+#include <linux/tls.h>
+#include <linux/tcp.h>
+#include <linux/types.h>
+#include <linux/udp.h>
+
+#include <sys/ioctl.h>
+#include <sys/types.h>
+#include <sys/sendfile.h>
+#include <sys/socket.h>
+#include <sys/stat.h>
+
+#include "../kselftest_harness.h"
+
+#define UDP_ULP		105
+
+#ifndef SOL_UDP
+#define SOL_UDP		17
+#endif
+
+// 1. QUIC ULP Registration Test
+
+FIXTURE(quic_ulp)
+{
+	int sfd;
+	socklen_t len_s;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} server;
+	int default_net_ns_fd;
+	int server_net_ns_fd;
+};
+
+FIXTURE_VARIANT(quic_ulp)
+{
+	unsigned int af_server;
+	char *server_address;
+	unsigned short server_port;
+};
+
+FIXTURE_VARIANT_ADD(quic_ulp, ipv4)
+{
+	.af_server = AF_INET,
+	.server_address = "10.0.0.2",
+	.server_port = 7101,
+};
+
+FIXTURE_VARIANT_ADD(quic_ulp, ipv6)
+{
+	.af_server = AF_INET6,
+	.server_address = "2001::2",
+	.server_port = 7102,
+};
+
+FIXTURE_SETUP(quic_ulp)
+{
+	char path[PATH_MAX];
+	int optval = 1;
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/net", getpid());
+	self->default_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->default_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns2");
+	self->server_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->server_net_ns_fd, 0);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	self->sfd = socket(variant->af_server, SOCK_DGRAM, 0);
+	ASSERT_NE(setsockopt(self->sfd, SOL_SOCKET, SO_REUSEPORT, &optval,
+		   sizeof(optval)), -1);
+	if (variant->af_server == AF_INET) {
+		self->len_s = sizeof(self->server.addr);
+		self->server.addr.sin_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr.sin_addr);
+		self->server.addr.sin_port = htons(variant->server_port);
+		ASSERT_EQ(bind(self->sfd, &self->server.addr, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr,
+				      &self->len_s), 0);
+	} else {
+		self->len_s = sizeof(self->server.addr6);
+		self->server.addr6.sin6_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr6.sin6_addr);
+		self->server.addr6.sin6_port = htons(variant->server_port);
+		ASSERT_EQ(bind(self->sfd, &self->server.addr6, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr6,
+				      &self->len_s), 0);
+	}
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+};
+
+FIXTURE_TEARDOWN(quic_ulp)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	close(self->sfd);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+};
+
+TEST_F(quic_ulp, request_nonexistent_udp_ulp)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_ULP,
+			     "nonexistent", sizeof("nonexistent")), -1);
+	// If UDP_ULP option is not present, the error would be ENOPROTOOPT.
+	ASSERT_EQ(errno, ENOENT);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+};
+
+TEST_F(quic_ulp, request_quic_crypto_udp_ulp)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_ULP,
+			     "quic-crypto", sizeof("quic-crypto")), 0);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+};
+
+// 2. QUIC Data Path Operation Tests
+
+#define DO_NOT_SETUP_FLOW 0
+#define SETUP_FLOW 1
+
+#define DO_NOT_USE_CLIENT 0
+#define USE_CLIENT 1
+
+FIXTURE(quic_data)
+{
+	int sfd, c1fd, c2fd;
+	socklen_t len_c1;
+	socklen_t len_c2;
+	socklen_t len_s;
+
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} client_1;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} client_2;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} server;
+	int default_net_ns_fd;
+	int client_1_net_ns_fd;
+	int client_2_net_ns_fd;
+	int server_net_ns_fd;
+};
+
+FIXTURE_VARIANT(quic_data)
+{
+	unsigned int af_client_1;
+	char *client_1_address;
+	unsigned short client_1_port;
+	uint8_t conn_id_1[8];
+	uint8_t conn_1_key[16];
+	uint8_t conn_1_iv[12];
+	uint8_t conn_1_hdr_key[16];
+	size_t conn_id_1_len;
+	bool setup_flow_1;
+	bool use_client_1;
+	unsigned int af_client_2;
+	char *client_2_address;
+	unsigned short client_2_port;
+	uint8_t conn_id_2[8];
+	uint8_t conn_2_key[16];
+	uint8_t conn_2_iv[12];
+	uint8_t conn_2_hdr_key[16];
+	size_t conn_id_2_len;
+	bool setup_flow_2;
+	bool use_client_2;
+	unsigned int af_server;
+	char *server_address;
+	unsigned short server_port;
+};
+
+FIXTURE_VARIANT_ADD(quic_data, ipv4)
+{
+	.af_client_1 = AF_INET,
+	.client_1_address = "10.0.0.1",
+	.client_1_port = 6667,
+	.conn_id_1 = {0x11, 0x12, 0x13, 0x14},
+	.conn_id_1_len = 4,
+	.setup_flow_1 = SETUP_FLOW,
+	.use_client_1 = USE_CLIENT,
+	.af_client_2 = AF_INET,
+	.client_2_address = "10.0.0.3",
+	.client_2_port = 6668,
+	.conn_id_2 = {0x21, 0x22, 0x23, 0x24},
+	.conn_id_2_len = 4,
+	.setup_flow_2 = SETUP_FLOW,
+	//.use_client_2 = USE_CLIENT,
+	.af_server = AF_INET,
+	.server_address = "10.0.0.2",
+	.server_port = 6669,
+};
+
+FIXTURE_VARIANT_ADD(quic_data, ipv6_mapped_ipv4_two_conns)
+{
+	.af_client_1 = AF_INET6,
+	.client_1_address = "::ffff:10.0.0.1",
+	.client_1_port = 6670,
+	.conn_id_1 = {0x11, 0x12, 0x13, 0x14},
+	.conn_id_1_len = 4,
+	.setup_flow_1 = SETUP_FLOW,
+	.use_client_1 = USE_CLIENT,
+	.af_client_2 = AF_INET6,
+	.client_2_address = "::ffff:10.0.0.3",
+	.client_2_port = 6671,
+	.conn_id_2 = {0x21, 0x22, 0x23, 0x24},
+	.conn_id_2_len = 4,
+	.setup_flow_2 = SETUP_FLOW,
+	.use_client_2 = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "::ffff:10.0.0.2",
+	.server_port = 6672,
+};
+
+FIXTURE_VARIANT_ADD(quic_data, ipv6_mapped_ipv4_setup_ipv4_one_conn)
+{
+	.af_client_1 = AF_INET,
+	.client_1_address = "10.0.0.3",
+	.client_1_port = 6676,
+	.conn_id_1 = {0x11, 0x12, 0x13, 0x14},
+	.conn_id_1_len = 4,
+	.setup_flow_1 = SETUP_FLOW,
+	.use_client_1 = DO_NOT_USE_CLIENT,
+	.af_client_2 = AF_INET6,
+	.client_2_address = "::ffff:10.0.0.3",
+	.client_2_port = 6676,
+	.conn_id_2 = {0x11, 0x12, 0x13, 0x14},
+	.conn_id_2_len = 4,
+	.setup_flow_2 = DO_NOT_SETUP_FLOW,
+	.use_client_2 = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "::ffff:10.0.0.2",
+	.server_port = 6677,
+};
+
+FIXTURE_VARIANT_ADD(quic_data, ipv6_mapped_ipv4_setup_ipv6_one_conn)
+{
+	.af_client_1 = AF_INET6,
+	.client_1_address = "::ffff:10.0.0.3",
+	.client_1_port = 6678,
+	.conn_id_1 = {0x11, 0x12, 0x13, 0x14},
+	.setup_flow_1 = SETUP_FLOW,
+	.use_client_1 = DO_NOT_USE_CLIENT,
+	.af_client_2 = AF_INET,
+	.client_2_address = "10.0.0.3",
+	.client_2_port = 6678,
+	.conn_id_2 = {0x11, 0x12, 0x13, 0x14},
+	.setup_flow_2 = DO_NOT_SETUP_FLOW,
+	.use_client_2 = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "::ffff:10.0.0.2",
+	.server_port = 6679,
+};
+
+FIXTURE_SETUP(quic_data)
+{
+	char path[PATH_MAX];
+	int optval = 1;
+
+	if (variant->af_client_1 == AF_INET) {
+		self->len_c1 = sizeof(self->client_1.addr);
+		self->client_1.addr.sin_family = variant->af_client_1;
+		inet_pton(variant->af_client_1, variant->client_1_address,
+			  &self->client_1.addr.sin_addr);
+		self->client_1.addr.sin_port = htons(variant->client_1_port);
+	} else {
+		self->len_c1 = sizeof(self->client_1.addr6);
+		self->client_1.addr6.sin6_family = variant->af_client_1;
+		inet_pton(variant->af_client_1, variant->client_1_address,
+			  &self->client_1.addr6.sin6_addr);
+		self->client_1.addr6.sin6_port = htons(variant->client_1_port);
+	}
+
+	if (variant->af_client_2 == AF_INET) {
+		self->len_c2 = sizeof(self->client_2.addr);
+		self->client_2.addr.sin_family = variant->af_client_2;
+		inet_pton(variant->af_client_2, variant->client_2_address,
+			  &self->client_2.addr.sin_addr);
+		self->client_2.addr.sin_port = htons(variant->client_2_port);
+	} else {
+		self->len_c2 = sizeof(self->client_2.addr6);
+		self->client_2.addr6.sin6_family = variant->af_client_2;
+		inet_pton(variant->af_client_2, variant->client_2_address,
+			  &self->client_2.addr6.sin6_addr);
+		self->client_2.addr6.sin6_port = htons(variant->client_2_port);
+	}
+
+	if (variant->af_server == AF_INET) {
+		self->len_s = sizeof(self->server.addr);
+		self->server.addr.sin_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr.sin_addr);
+		self->server.addr.sin_port = htons(variant->server_port);
+	} else {
+		self->len_s = sizeof(self->server.addr6);
+		self->server.addr6.sin6_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr6.sin6_addr);
+		self->server.addr6.sin6_port = htons(variant->server_port);
+	}
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/net", getpid());
+	self->default_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->default_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns11");
+	self->client_1_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->client_1_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns12");
+	self->client_2_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->client_2_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns2");
+	self->server_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->server_net_ns_fd, 0);
+
+	if (variant->use_client_1) {
+		ASSERT_NE(setns(self->client_1_net_ns_fd, 0), -1);
+		self->c1fd = socket(variant->af_client_1, SOCK_DGRAM, 0);
+		ASSERT_NE(setsockopt(self->c1fd, SOL_SOCKET, SO_REUSEPORT, &optval,
+		   sizeof(optval)), -1);
+		if (variant->af_client_1 == AF_INET) {
+			ASSERT_EQ(bind(self->c1fd, &self->client_1.addr,
+				       self->len_c1), 0);
+			ASSERT_EQ(getsockname(self->c1fd, &self->client_1.addr,
+					      &self->len_c1), 0);
+		} else {
+			ASSERT_EQ(bind(self->c1fd, &self->client_1.addr6,
+				       self->len_c1), 0);
+			ASSERT_EQ(getsockname(self->c1fd, &self->client_1.addr6,
+					      &self->len_c1), 0);
+		}
+	}
+
+	if (variant->use_client_2) {
+		ASSERT_NE(setns(self->client_2_net_ns_fd, 0), -1);
+		self->c2fd = socket(variant->af_client_2, SOCK_DGRAM, 0);
+		ASSERT_NE(setsockopt(self->c2fd, SOL_SOCKET, SO_REUSEPORT, &optval,
+		   sizeof(optval)), -1);
+		if (variant->af_client_2 == AF_INET) {
+			ASSERT_EQ(bind(self->c2fd, &self->client_2.addr,
+				       self->len_c2), 0);
+			ASSERT_EQ(getsockname(self->c2fd, &self->client_2.addr,
+					      &self->len_c2), 0);
+		} else {
+			ASSERT_EQ(bind(self->c2fd, &self->client_2.addr6,
+				       self->len_c2), 0);
+			ASSERT_EQ(getsockname(self->c2fd, &self->client_2.addr6,
+					      &self->len_c2), 0);
+		}
+	}
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	self->sfd = socket(variant->af_server, SOCK_DGRAM, 0);
+	ASSERT_NE(setsockopt(self->sfd, SOL_SOCKET, SO_REUSEPORT, &optval,
+	   sizeof(optval)), -1);
+	if (variant->af_server == AF_INET) {
+		ASSERT_EQ(bind(self->sfd, &self->server.addr, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr,
+				      &self->len_s), 0);
+	} else {
+		ASSERT_EQ(bind(self->sfd, &self->server.addr6, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr6,
+				      &self->len_s), 0);
+	}
+
+	ASSERT_EQ(setsockopt(self->sfd, IPPROTO_UDP, UDP_ULP,
+			     "quic-crypto", sizeof("quic-crypto")), 0);
+
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+FIXTURE_TEARDOWN(quic_data)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	close(self->sfd);
+	ASSERT_NE(setns(self->client_1_net_ns_fd, 0), -1);
+	close(self->c1fd);
+	ASSERT_NE(setns(self->client_2_net_ns_fd, 0), -1);
+	close(self->c2fd);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+TEST_F(quic_data, send_fail_no_flow)
+{
+	char const *test_str = "test_read";
+	int send_len = 10;
+
+	ASSERT_EQ(strlen(test_str) + 1, send_len);
+	EXPECT_EQ(sendto(self->sfd, test_str, send_len, 0,
+			 &self->client_1.addr, self->len_c1), -1);
+};
+
+TEST_F(quic_data, encrypt_two_conn_gso_1200_iov_2_size_9000_aesgcm128)
+{
+	size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
+	uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)];
+	struct quic_connection_info conn_1_info;
+	struct quic_connection_info conn_2_info;
+	struct quic_tx_ancillary_data *anc_data;
+	socklen_t recv_addr_len_1;
+	socklen_t recv_addr_len_2;
+	struct cmsghdr *cmsg_hdr;
+	int frag_size = 1200;
+	int send_len = 9000;
+	struct iovec iov[2];
+	int msg_len = 4500;
+	struct msghdr msg;
+	char *test_str_1;
+	char *test_str_2;
+	char *buf_1;
+	char *buf_2;
+	int i;
+
+	test_str_1 = (char *)malloc(9000);
+	test_str_2 = (char *)malloc(9000);
+	memset(test_str_1, 0, 9000);
+	memset(test_str_2, 0, 9000);
+
+	buf_1 = (char *)malloc(10000);
+	buf_2 = (char *)malloc(10000);
+	for (i = 0; i < 9000; i += (1200 - 16)) {
+		test_str_1[i] = 0x40;
+		memcpy(&test_str_1[i + 1], &variant->conn_id_1,
+		       variant->conn_id_1_len);
+		test_str_1[i + 1 + variant->conn_id_1_len] = 0xca;
+
+		test_str_2[i] = 0x40;
+		memcpy(&test_str_2[i + 1], &variant->conn_id_2,
+		       variant->conn_id_2_len);
+		test_str_2[i + 1 + variant->conn_id_2_len] = 0xca;
+	}
+
+	// program the connection into the offload
+	conn_1_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	memset(&conn_1_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_1_info.key.conn_id_length = variant->conn_id_1_len;
+	memcpy(conn_1_info.key.conn_id,
+	       &variant->conn_id_1,
+	       variant->conn_id_1_len);
+
+	conn_2_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	memset(&conn_2_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_2_info.key.conn_id_length = variant->conn_id_2_len;
+	memcpy(conn_2_info.key.conn_id,
+	       &variant->conn_id_2,
+	       variant->conn_id_2_len);
+
+	memcpy(&conn_1_info.aes_gcm_128.payload_key,
+	       &variant->conn_1_key, 16);
+	memcpy(&conn_1_info.aes_gcm_128.payload_iv,
+	       &variant->conn_1_iv, 12);
+	memcpy(&conn_1_info.aes_gcm_128.header_key,
+	       &variant->conn_1_hdr_key, 16);
+	memcpy(&conn_2_info.aes_gcm_128.payload_key,
+	       &variant->conn_2_key, 16);
+	memcpy(&conn_2_info.aes_gcm_128.payload_iv,
+	       &variant->conn_2_iv, 12);
+	memcpy(&conn_2_info.aes_gcm_128.header_key,
+	       &variant->conn_2_hdr_key,
+	       16);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
+			     sizeof(frag_size)), 0);
+
+	if (variant->setup_flow_1)
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_ADD_TX_CONNECTION,
+				     &conn_1_info, sizeof(conn_1_info)), 0);
+
+	if (variant->setup_flow_2)
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_ADD_TX_CONNECTION,
+				     &conn_2_info, sizeof(conn_2_info)), 0);
+
+	recv_addr_len_1 = self->len_c1;
+	recv_addr_len_2 = self->len_c2;
+
+	iov[0].iov_base = test_str_1;
+	iov[0].iov_len = msg_len;
+	iov[1].iov_base = (void *)test_str_1 + 4500;
+	iov[1].iov_len = msg_len;
+
+	msg.msg_name = (self->client_1.addr.sin_family == AF_INET)
+		       ? (void *)&self->client_1.addr
+		       : (void *)&self->client_1.addr6;
+	msg.msg_namelen = self->len_c1;
+	msg.msg_iov = iov;
+	msg.msg_iovlen = 2;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->flags = 0;
+	anc_data->conn_id_length = variant->conn_id_1_len;
+
+	if (variant->use_client_1)
+		EXPECT_EQ(sendmsg(self->sfd, &msg, 0), send_len);
+
+	iov[0].iov_base = test_str_2;
+	iov[0].iov_len = msg_len;
+	iov[1].iov_base = (void *)test_str_2 + 4500;
+	iov[1].iov_len = msg_len;
+	msg.msg_name = (self->client_2.addr.sin_family == AF_INET)
+		       ? (void *)&self->client_2.addr
+		       : (void *)&self->client_2.addr6;
+	msg.msg_namelen = self->len_c2;
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->conn_id_length = variant->conn_id_2_len;
+	anc_data->flags = 0;
+
+	if (variant->use_client_2)
+		EXPECT_EQ(sendmsg(self->sfd, &msg, 0), send_len);
+
+	if (variant->use_client_1) {
+		ASSERT_NE(setns(self->client_1_net_ns_fd, 0), -1);
+		if (variant->af_client_1 == AF_INET) {
+			for (i = 0; i < 7; ++i) {
+				EXPECT_EQ(recvfrom(self->c1fd, buf_1, 9000, 0,
+						   &self->client_1.addr,
+						   &recv_addr_len_1),
+					  1200);
+				// Validate framing is intact.
+				EXPECT_EQ(memcmp((void *)buf_1 + 1,
+						 &variant->conn_id_1,
+						 variant->conn_id_1_len), 0);
+			}
+			EXPECT_EQ(recvfrom(self->c1fd, buf_1, 9000, 0,
+					   &self->client_1.addr,
+					   &recv_addr_len_1),
+				  728);
+			EXPECT_EQ(memcmp((void *)buf_1 + 1,
+					 &variant->conn_id_1,
+					 variant->conn_id_1_len), 0);
+		} else {
+			for (i = 0; i < 7; ++i) {
+				EXPECT_EQ(recvfrom(self->c1fd, buf_1, 9000, 0,
+						   &self->client_1.addr6,
+						   &recv_addr_len_1),
+					1200);
+			}
+			EXPECT_EQ(recvfrom(self->c1fd, buf_1, 9000, 0,
+					   &self->client_1.addr6,
+					   &recv_addr_len_1),
+				  728);
+			EXPECT_EQ(memcmp((void *)buf_1 + 1,
+					 &variant->conn_id_1,
+					 variant->conn_id_1_len), 0);
+		}
+		EXPECT_NE(memcmp(buf_1, test_str_1, send_len), 0);
+	}
+
+	if (variant->use_client_2) {
+		ASSERT_NE(setns(self->client_2_net_ns_fd, 0), -1);
+		if (variant->af_client_2 == AF_INET) {
+			for (i = 0; i < 7; ++i) {
+				EXPECT_EQ(recvfrom(self->c2fd, buf_2, 9000, 0,
+						   &self->client_2.addr,
+						   &recv_addr_len_2),
+					  1200);
+				EXPECT_EQ(memcmp((void *)buf_2 + 1,
+						 &variant->conn_id_2,
+						 variant->conn_id_2_len), 0);
+			}
+			EXPECT_EQ(recvfrom(self->c2fd, buf_2, 9000, 0,
+					   &self->client_2.addr,
+					   &recv_addr_len_2),
+				  728);
+			EXPECT_EQ(memcmp((void *)buf_2 + 1,
+					 &variant->conn_id_2,
+					 variant->conn_id_2_len), 0);
+		} else {
+			for (i = 0; i < 7; ++i) {
+				EXPECT_EQ(recvfrom(self->c2fd, buf_2, 9000, 0,
+						   &self->client_2.addr6,
+						   &recv_addr_len_2),
+					  1200);
+				EXPECT_EQ(memcmp((void *)buf_2 + 1,
+						 &variant->conn_id_2,
+						 variant->conn_id_2_len), 0);
+			}
+			EXPECT_EQ(recvfrom(self->c2fd, buf_2, 9000, 0,
+					   &self->client_2.addr6,
+					   &recv_addr_len_2),
+				  728);
+			EXPECT_EQ(memcmp((void *)buf_2 + 1,
+					 &variant->conn_id_2,
+					 variant->conn_id_2_len), 0);
+		}
+		EXPECT_NE(memcmp(buf_2, test_str_2, send_len), 0);
+	}
+
+	if (variant->use_client_1 && variant->use_client_2)
+		EXPECT_NE(memcmp(buf_1, buf_2, send_len), 0);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	if (variant->setup_flow_1) {
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_DEL_TX_CONNECTION,
+				     &conn_1_info, sizeof(conn_1_info)),
+			  0);
+	}
+	if (variant->setup_flow_2) {
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_DEL_TX_CONNECTION,
+				     &conn_2_info, sizeof(conn_2_info)),
+			  0);
+	}
+	free(test_str_1);
+	free(test_str_2);
+	free(buf_1);
+	free(buf_2);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+// 3. QUIC Encryption Tests
+
+FIXTURE(quic_crypto)
+{
+	int sfd, cfd;
+	socklen_t len_c;
+	socklen_t len_s;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} client;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} server;
+	int default_net_ns_fd;
+	int client_net_ns_fd;
+	int server_net_ns_fd;
+};
+
+FIXTURE_VARIANT(quic_crypto)
+{
+	unsigned int af_client;
+	char *client_address;
+	unsigned short client_port;
+	uint32_t algo;
+	uint8_t conn_id[8];
+	uint8_t conn_key[16];
+	uint8_t conn_iv[12];
+	uint8_t conn_hdr_key[16];
+	size_t conn_id_len;
+	bool setup_flow;
+	bool use_client;
+	unsigned int af_server;
+	char *server_address;
+	unsigned short server_port;
+};
+
+FIXTURE_SETUP(quic_crypto)
+{
+	char path[PATH_MAX];
+	int optval = 1;
+
+	if (variant->af_client == AF_INET) {
+		self->len_c = sizeof(self->client.addr);
+		self->client.addr.sin_family = variant->af_client;
+		inet_pton(variant->af_client, variant->client_address,
+			  &self->client.addr.sin_addr);
+		self->client.addr.sin_port = htons(variant->client_port);
+	} else {
+		self->len_c = sizeof(self->client.addr6);
+		self->client.addr6.sin6_family = variant->af_client;
+		inet_pton(variant->af_client, variant->client_address,
+			  &self->client.addr6.sin6_addr);
+		self->client.addr6.sin6_port = htons(variant->client_port);
+	}
+
+	if (variant->af_server == AF_INET) {
+		self->len_s = sizeof(self->server.addr);
+		self->server.addr.sin_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr.sin_addr);
+		self->server.addr.sin_port = htons(variant->server_port);
+	} else {
+		self->len_s = sizeof(self->server.addr6);
+		self->server.addr6.sin6_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr6.sin6_addr);
+		self->server.addr6.sin6_port = htons(variant->server_port);
+	}
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/net", getpid());
+	self->default_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->default_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns11");
+	self->client_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->client_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns2");
+	self->server_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->server_net_ns_fd, 0);
+
+	if (variant->use_client) {
+		ASSERT_NE(setns(self->client_net_ns_fd, 0), -1);
+		self->cfd = socket(variant->af_client, SOCK_DGRAM, 0);
+		ASSERT_NE(setsockopt(self->cfd, SOL_SOCKET, SO_REUSEPORT, &optval,
+			sizeof(optval)), -1);
+		if (variant->af_client == AF_INET) {
+			ASSERT_EQ(bind(self->cfd, &self->client.addr,
+				       self->len_c), 0);
+			ASSERT_EQ(getsockname(self->cfd, &self->client.addr,
+					      &self->len_c), 0);
+		} else {
+			ASSERT_EQ(bind(self->cfd, &self->client.addr6,
+				       self->len_c), 0);
+			ASSERT_EQ(getsockname(self->cfd, &self->client.addr6,
+					      &self->len_c), 0);
+		}
+	}
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	self->sfd = socket(variant->af_server, SOCK_DGRAM, 0);
+	ASSERT_NE(setsockopt(self->sfd, SOL_SOCKET, SO_REUSEPORT, &optval,
+	   sizeof(optval)), -1);
+	if (variant->af_server == AF_INET) {
+		ASSERT_EQ(bind(self->sfd, &self->server.addr, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr,
+				      &self->len_s),
+			  0);
+	} else {
+		ASSERT_EQ(bind(self->sfd, &self->server.addr6, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr6,
+				      &self->len_s),
+			  0);
+	}
+
+	ASSERT_EQ(setsockopt(self->sfd, IPPROTO_UDP, UDP_ULP,
+			     "quic-crypto", sizeof("quic-crypto")), 0);
+
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+FIXTURE_TEARDOWN(quic_crypto)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	close(self->sfd);
+	ASSERT_NE(setns(self->client_net_ns_fd, 0), -1);
+	close(self->cfd);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+FIXTURE_VARIANT_ADD(quic_crypto, ipv4)
+{
+	.af_client = AF_INET,
+	.client_address = "10.0.0.1",
+	.client_port = 7667,
+	.conn_id = {0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12, 0x49},
+	.conn_key = {0x87, 0x71, 0xEA, 0x1D, 0xFB, 0xBE, 0x7A, 0x45, 0xBB,
+		0xE2, 0x7E, 0xBC, 0x0B, 0x53, 0x94, 0x99},
+	.conn_iv = {0x3A, 0xA7, 0x46, 0x72, 0xE9, 0x83, 0x6B, 0x55, 0xDA,
+		0x66, 0x7B, 0xDA},
+	.conn_hdr_key = {0xC9, 0x8E, 0xFD, 0xF2, 0x0B, 0x64, 0x8C, 0x57,
+		0xB5, 0x0A, 0xB2, 0xD2, 0x21, 0xD3, 0x66, 0xA5},
+	.conn_id_len = 8,
+	.setup_flow = SETUP_FLOW,
+	.use_client = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "10.0.0.2",
+	.server_port = 7669,
+};
+
+FIXTURE_VARIANT_ADD(quic_crypto, ipv6)
+{
+	.af_client = AF_INET6,
+	.client_address = "2001::1",
+	.client_port = 7673,
+	.conn_id = {0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12, 0x49},
+	.conn_key = {0x87, 0x71, 0xEA, 0x1D, 0xFB, 0xBE, 0x7A, 0x45, 0xBB,
+		0xE2, 0x7E, 0xBC, 0x0B, 0x53, 0x94, 0x99},
+	.conn_iv = {0x3A, 0xA7, 0x46, 0x72, 0xE9, 0x83, 0x6B, 0x55, 0xDA,
+		0x66, 0x7B, 0xDA},
+	.conn_hdr_key = {0xC9, 0x8E, 0xFD, 0xF2, 0x0B, 0x64, 0x8C, 0x57,
+		0xB5, 0x0A, 0xB2, 0xD2, 0x21, 0xD3, 0x66, 0xA5},
+	.conn_id_len = 8,
+	.setup_flow = SETUP_FLOW,
+	.use_client = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "2001::2",
+	.server_port = 7675,
+};
+
+TEST_F(quic_crypto, encrypt_test_vector_aesgcm128_single_flow_gso_in_control)
+{
+	char test_str[37] = {// Header, conn id and pkt num
+			     0x40, 0x08, 0x6B, 0xBF, 0x88, 0x82, 0xB9, 0x12,
+			     0x49, 0xCA,
+			     // Payload
+			     0x02, 0x80, 0xDE, 0x40, 0x39, 0x40, 0xF6, 0x00,
+			     0x01, 0x0B, 0x00, 0x0F, 0x65, 0x63, 0x68, 0x6F,
+			     0x20, 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36,
+			     0x37, 0x38, 0x39
+	};
+
+	char match_str[53] = {
+			     0x46, 0x08, 0x6B, 0xBF, 0x88, 0x82, 0xB9, 0x12,
+			     0x49, 0x1C, 0x44, 0xB8, 0x41, 0xBB, 0xCF, 0x6E,
+			     0x0A, 0x2A, 0x24, 0xFB, 0xB4, 0x79, 0x62, 0xEA,
+			     0x59, 0x38, 0x1A, 0x0E, 0x50, 0x1E, 0x59, 0xED,
+			     0x3F, 0x8E, 0x7E, 0x5A, 0x70, 0xE4, 0x2A, 0xBC,
+			     0x2A, 0xFA, 0x2B, 0x54, 0xEB, 0x89, 0xC3, 0x2C,
+			     0xB6, 0x8C, 0x1E, 0xAB, 0x2D
+	};
+
+	size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
+	uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)
+			 + CMSG_SPACE(sizeof(uint16_t))];
+	struct quic_tx_ancillary_data *anc_data;
+	struct quic_connection_info conn_info;
+	int send_len = sizeof(test_str);
+	int msg_len = sizeof(test_str);
+	uint16_t frag_size = 1200;
+	struct cmsghdr *cmsg_hdr;
+	int wrong_frag_size = 26;
+	socklen_t recv_addr_len;
+	struct iovec iov[2];
+	struct msghdr msg;
+	char *buf;
+
+	buf = (char *)malloc(1024);
+	conn_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_info.key.conn_id_length = variant->conn_id_len;
+	memcpy(conn_info.key.conn_id,
+	       &variant->conn_id,
+	       variant->conn_id_len);
+	memcpy(&conn_info.aes_gcm_128.payload_key,
+	       &variant->conn_key, 16);
+	memcpy(&conn_info.aes_gcm_128.payload_iv,
+	       &variant->conn_iv, 12);
+	memcpy(&conn_info.aes_gcm_128.header_key,
+	       &variant->conn_hdr_key,
+	       16);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &wrong_frag_size,
+			     sizeof(wrong_frag_size)), 0);
+
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION,
+			     &conn_info, sizeof(conn_info)), 0);
+
+	recv_addr_len = self->len_c;
+
+	iov[0].iov_base = test_str;
+	iov[0].iov_len = msg_len;
+
+	memset(cmsg_buf, 0, sizeof(cmsg_buf));
+	msg.msg_name = (self->client.addr.sin_family == AF_INET)
+		       ? (void *)&self->client.addr
+		       : (void *)&self->client.addr6;
+	msg.msg_namelen = self->len_c;
+	msg.msg_iov = iov;
+	msg.msg_iovlen = 1;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->flags = 0;
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->conn_id_length = variant->conn_id_len;
+
+	cmsg_hdr = CMSG_NXTHDR(&msg, cmsg_hdr);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_SEGMENT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(sizeof(uint16_t));
+	memcpy(CMSG_DATA(cmsg_hdr), (void *)&frag_size, sizeof(frag_size));
+
+	EXPECT_EQ(sendmsg(self->sfd, &msg, 0), send_len);
+	ASSERT_NE(setns(self->client_net_ns_fd, 0), -1);
+	if (variant->af_client == AF_INET) {
+		EXPECT_EQ(recvfrom(self->cfd, buf, 1024, 0,
+				   &self->client.addr, &recv_addr_len),
+			  sizeof(match_str));
+	} else {
+		EXPECT_EQ(recvfrom(self->cfd, buf, 9000, 0,
+				   &self->client.addr6, &recv_addr_len),
+			  sizeof(match_str));
+	}
+	EXPECT_STREQ(buf, match_str);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION,
+			     &conn_info, sizeof(conn_info)), 0);
+	free(buf);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+TEST_F(quic_crypto, encrypt_test_vector_aesgcm128_single_flow_gso_in_setsockopt)
+{
+	char test_str[37] = {// Header, conn id and pkt num
+			     0x40, 0x08, 0x6B, 0xBF, 0x88, 0x82, 0xB9, 0x12,
+			     0x49, 0xCA,
+			     // Payload
+			     0x02, 0x80, 0xDE, 0x40, 0x39, 0x40, 0xF6, 0x00,
+			     0x01, 0x0B, 0x00, 0x0F, 0x65, 0x63, 0x68, 0x6F,
+			     0x20, 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36,
+			     0x37, 0x38, 0x39
+	};
+
+	char match_str[53] = {
+			     0x46, 0x08, 0x6B, 0xBF, 0x88, 0x82, 0xB9, 0x12,
+			     0x49, 0x1C, 0x44, 0xB8, 0x41, 0xBB, 0xCF, 0x6E,
+			     0x0A, 0x2A, 0x24, 0xFB, 0xB4, 0x79, 0x62, 0xEA,
+			     0x59, 0x38, 0x1A, 0x0E, 0x50, 0x1E, 0x59, 0xED,
+			     0x3F, 0x8E, 0x7E, 0x5A, 0x70, 0xE4, 0x2A, 0xBC,
+			     0x2A, 0xFA, 0x2B, 0x54, 0xEB, 0x89, 0xC3, 0x2C,
+			     0xB6, 0x8C, 0x1E, 0xAB, 0x2D
+	};
+
+	size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
+	uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)];
+	struct quic_tx_ancillary_data *anc_data;
+	struct quic_connection_info conn_info;
+	int send_len = sizeof(test_str);
+	int msg_len = sizeof(test_str);
+	struct cmsghdr *cmsg_hdr;
+	socklen_t recv_addr_len;
+	int frag_size = 1200;
+	struct iovec iov[2];
+	struct msghdr msg;
+	char *buf;
+
+	buf = (char *)malloc(1024);
+
+	conn_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_info.key.conn_id_length = variant->conn_id_len;
+	memcpy(&conn_info.key.conn_id,
+	       &variant->conn_id,
+	       variant->conn_id_len);
+	memcpy(&conn_info.aes_gcm_128.payload_key,
+	       &variant->conn_key, 16);
+	memcpy(&conn_info.aes_gcm_128.payload_iv,
+	       &variant->conn_iv, 12);
+	memcpy(&conn_info.aes_gcm_128.header_key,
+	       &variant->conn_hdr_key,
+	       16);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
+			     sizeof(frag_size)),
+		  0);
+
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION,
+			     &conn_info, sizeof(conn_info)),
+		  0);
+
+	recv_addr_len = self->len_c;
+
+	iov[0].iov_base = test_str;
+	iov[0].iov_len = msg_len;
+
+	msg.msg_name = (self->client.addr.sin_family == AF_INET)
+		       ? (void *)&self->client.addr
+		       : (void *)&self->client.addr6;
+	msg.msg_namelen = self->len_c;
+	msg.msg_iov = iov;
+	msg.msg_iovlen = 1;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->flags = 0;
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->conn_id_length = variant->conn_id_len;
+
+	EXPECT_EQ(sendmsg(self->sfd, &msg, 0), send_len);
+	ASSERT_NE(setns(self->client_net_ns_fd, 0), -1);
+	if (variant->af_client == AF_INET) {
+		EXPECT_EQ(recvfrom(self->cfd, buf, 1024, 0,
+				   &self->client.addr, &recv_addr_len),
+			  sizeof(match_str));
+	} else {
+		EXPECT_EQ(recvfrom(self->cfd, buf, 9000, 0,
+				   &self->client.addr6, &recv_addr_len),
+			  sizeof(match_str));
+	}
+	EXPECT_STREQ(buf, match_str);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION,
+			     &conn_info, sizeof(conn_info)), 0);
+	free(buf);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/net/quic.sh b/tools/testing/selftests/net/quic.sh
new file mode 100755
index 000000000000..6c684e670e82
--- /dev/null
+++ b/tools/testing/selftests/net/quic.sh
@@ -0,0 +1,45 @@
+#!/bin/bash
+
+sudo ip netns add ns11
+sudo ip netns add ns12
+sudo ip netns add ns2
+sudo ip link add veth11 type veth peer name br-veth11
+sudo ip link add veth12 type veth peer name br-veth12
+sudo ip link add veth2 type veth peer name br-veth2
+sudo ip link set veth11 netns ns11
+sudo ip link set veth12 netns ns12
+sudo ip link set veth2 netns ns2
+sudo ip netns exec ns11 ip addr add 10.0.0.1/24 dev veth11
+sudo ip netns exec ns11 ip addr add ::ffff:10.0.0.1/96 dev veth11
+sudo ip netns exec ns11 ip addr add 2001::1/64 dev veth11
+sudo ip netns exec ns12 ip addr add 10.0.0.3/24 dev veth12
+sudo ip netns exec ns12 ip addr add ::ffff:10.0.0.3/96 dev veth12
+sudo ip netns exec ns12 ip addr add 2001::3/64 dev veth12
+sudo ip netns exec ns2 ip addr add 10.0.0.2/24 dev veth2
+sudo ip netns exec ns2 ip addr add ::ffff:10.0.0.2/96 dev veth2
+sudo ip netns exec ns2 ip addr add 2001::2/64 dev veth2
+sudo ip link add name br1 type bridge forward_delay 0
+sudo ip link set br1 up
+sudo ip link set br-veth11 up
+sudo ip link set br-veth12 up
+sudo ip link set br-veth2 up
+sudo ip netns exec ns11 ip link set veth11 up
+sudo ip netns exec ns12 ip link set veth12 up
+sudo ip netns exec ns2 ip link set veth2 up
+sudo ip link set br-veth11 master br1
+sudo ip link set br-veth12 master br1
+sudo ip link set br-veth2 master br1
+sudo ip netns exec ns2 cat /proc/net/quic_stat
+
+printf "%s" "Waiting for bridge to start fowarding ..."
+while ! timeout 0.5 sudo ip netns exec ns2 ping -c 1 -n 2001::1 &> /dev/null
+do
+	printf "%c" "."
+done
+printf "\n%s\n"  "Bridge is operational"
+
+sudo ./quic
+sudo ip netns exec ns2 cat /proc/net/quic_stat
+sudo ip netns delete ns2
+sudo ip netns delete ns12
+sudo ip netns delete ns11
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [RFC net-next v2 1/6] Documentation on QUIC kernel Tx crypto.
  2022-08-06  0:11   ` [RFC net-next v2 1/6] Documentation on QUIC kernel Tx crypto Adel Abouchaev
@ 2022-08-06  3:05     ` Bagas Sanjaya
  2022-08-08 19:05       ` Adel Abouchaev
  0 siblings, 1 reply; 77+ messages in thread
From: Bagas Sanjaya @ 2022-08-06  3:05 UTC (permalink / raw)
  To: Adel Abouchaev
  Cc: kuba, davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest, kernel test robot

[-- Attachment #1: Type: text/plain, Size: 8323 bytes --]

On Fri, Aug 05, 2022 at 05:11:48PM -0700, Adel Abouchaev wrote:
> Adding Documentation/networking/quic.rst file to describe kernel QUIC
> code.
> 

Better say "Add documentation for kernel QUIC code".

> diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
> index 03b215bddde8..656fa1dac26b 100644
> --- a/Documentation/networking/index.rst
> +++ b/Documentation/networking/index.rst
> @@ -90,6 +90,7 @@ Contents:
>     plip
>     ppp_generic
>     proc_net_tcp
> +   quic
>     radiotap-headers
>     rds
>     regulatory
> diff --git a/Documentation/networking/quic.rst b/Documentation/networking/quic.rst
> new file mode 100644
> index 000000000000..416099b80e60
> --- /dev/null
> +++ b/Documentation/networking/quic.rst
> @@ -0,0 +1,186 @@
> +.. _kernel_quic:
> +
> +===========
> +KERNEL QUIC
> +===========
> +
> +Overview
> +========
> +
> +QUIC is a secure general-purpose transport protocol that creates a stateful
> +interaction between a client and a server. QUIC provides end-to-end integrity
> +and confidentiality. Refer to RFC 9000 for more information on QUIC.
> +
> +The kernel Tx side offload covers the encryption of the application streams
> +in the kernel rather than in the application. These packets are 1RTT packets
> +in QUIC connection. Encryption of every other packets is still done by the
> +QUIC library in user space.
> +
> +
> +
> +User Interface
> +==============
> +
> +Creating a QUIC connection
> +--------------------------
> +
> +QUIC connection originates and terminates in the application, using one of many
> +available QUIC libraries. The code instantiates QUIC client and QUIC server in
> +some form and configures them to use certain addresses and ports for the
> +source and destination. The client and server negotiate the set of keys to
> +protect the communication during different phases of the connection, maintain
> +the connection and perform congestion control.
> +
> +Requesting to add QUIC Tx kernel encryption to the connection
> +-------------------------------------------------------------
> +
> +Each flow that should be encrypted by the kernel needs to be registered with
> +the kernel using socket API. A setsockopt() call on the socket creates an
> +association between the QUIC connection ID of the flow with the encryption
> +parameters for the crypto operations:
> +
> +.. code-block:: c
> +
> +	struct quic_connection_info conn_info;
> +	char conn_id[5] = {0x01, 0x02, 0x03, 0x04, 0x05};
> +	const size_t conn_id_len = sizeof(conn_id);
> +	char conn_key[16] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
> +			     0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f};
> +	char conn_iv[12] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
> +			    0x08, 0x09, 0x0a, 0x0b};
> +	char conn_hdr_key[16] = {0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
> +				 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f
> +				};
> +
> +	conn_info.cipher_type = TLS_CIPHER_AES_GCM_128;
> +
> +	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
> +	conn_info.key.conn_id_length = 5;
> +	memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE
> +				      - conn_id_len],
> +	       &conn_id, conn_id_len);
> +
> +	memcpy(&conn_info.payload_key, conn_key, sizeof(conn_key));
> +	memcpy(&conn_info.payload_iv, conn_iv, sizeof(conn_iv));
> +	memcpy(&conn_info.header_key, conn_hdr_key, sizeof(conn_hdr_key));
> +
> +	setsockopt(fd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION, &conn_info,
> +		   sizeof(conn_info));
> +
> +
> +Requesting to remove QUIC Tx kernel crypto offload control messages
> +-------------------------------------------------------------------
> +
> +All flows are removed when the socket is closed. To request an explicit remove
> +of the offload for the connection during the lifetime of the socket the process
> +is similar to adding the flow. Only the connection ID and its length are
> +necessary to supply to remove the connection from the offload:
> +
> +.. code-block:: c
> +
> +	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
> +	conn_info.key.conn_id_length = 5;
> +	memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE
> +				      - conn_id_len],
> +	       &conn_id, conn_id_len);
> +	setsockopt(fd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION, &conn_info,
> +		   sizeof(conn_info));
> +
> +Sending QUIC application data
> +-----------------------------
> +
> +For QUIC Tx encryption offload, the application should use sendmsg() socket
> +call and provide ancillary data with information on connection ID length and
> +offload flags for the kernel to perform the encryption and GSO support if
> +requested.
> +
> +.. code-block:: c
> +
> +	size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
> +	uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)];
> +	struct quic_tx_ancillary_data * anc_data;
> +	size_t quic_data_len = 4500;
> +	struct cmsghdr * cmsg_hdr;
> +	char quic_data[9000];
> +	struct iovec iov[2];
> +	int send_len = 9000;
> +	struct msghdr msg;
> +	int err;
> +
> +	iov[0].iov_base = quic_data;
> +	iov[0].iov_len = quic_data_len;
> +	iov[1].iov_base = quic_data + 4500;
> +	iov[1].iov_len = quic_data_len;
> +
> +	if (client.addr.sin_family == AF_INET) {
> +		msg.msg_name = &client.addr;
> +		msg.msg_namelen = sizeof(client.addr);
> +	} else {
> +		msg.msg_name = &client.addr6;
> +		msg.msg_namelen = sizeof(client.addr6);
> +	}
> +
> +	msg.msg_iov = iov;
> +	msg.msg_iovlen = 2;
> +	msg.msg_control = cmsg_buf;
> +	msg.msg_controllen = sizeof(cmsg_buf);
> +	cmsg_hdr = CMSG_FIRSTHDR(&msg);
> +	cmsg_hdr->cmsg_level = IPPROTO_UDP;
> +	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
> +	cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
> +	anc_data = CMSG_DATA(cmsg_hdr);
> +	anc_data->flags = 0;
> +	anc_data->next_pkt_num = 0x0d65c9;
> +	anc_data->conn_id_length = conn_id_len;
> +	err = sendmsg(self->sfd, &msg, 0);
> +
> +QUIC Tx offload in kernel will read the data from userspace, encrypt and
> +copy it to the ciphertext within the same operation.
> +
> +
> +Sending QUIC application data with GSO
> +--------------------------------------
> +When GSO is in use, the kernel will use the GSO fragment size as the target
> +for ciphertext. The packets from the user space should align on the boundary
> +of GSO fragment size minus the size of the tag for the chosen cipher. For the
> +GSO fragment 1200, the plain packets should follow each other at every 1184
> +bytes, given the tag size of 16. After the encryption, the rest of the UDP
> +and IP stacks will follow the defined value of GSO fragment which will include
> +the trailing tag bytes.
> +
> +To set up GSO fragmentation:
> +
> +.. code-block:: c
> +
> +	setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
> +		   sizeof(frag_size));
> +
> +If the GSO fragment size is provided in ancillary data within the sendmsg()
> +call, the value in ancillary data will take precedence over the segment size
> +provided in setsockopt to split the payload into packets. This is consistent
> +with the UDP stack behavior.
> +
> +Integrating to userspace QUIC libraries
> +---------------------------------------
> +
> +Userspace QUIC libraries integration would depend on the implementation of the
> +QUIC protocol. For MVFST library, the control plane is integrated into the
> +handshake callbacks to properly configure the flows into the socket; and the
> +data plane is integrated into the methods that perform encryption and send
> +the packets to the batch scheduler for transmissions to the socket.
> +
> +MVFST library can be found at https://github.com/facebookincubator/mvfst.
> +
> +Statistics
> +==========
> +
> +QUIC Tx offload to the kernel has counters
> +(``/proc/net/quic_stat``):
> +
> +- ``QuicCurrTxSw`` -
> +  number of currently active kernel offloaded QUIC connections
> +- ``QuicTxSw`` -
> +  accumulative total number of offloaded QUIC connections
> +- ``QuicTxSwError`` -
> +  accumulative total number of errors during QUIC Tx offload to kernel
> +

The documentation looks OK (no new warnings).

Thanks.

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 273 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC net-next v2 1/6] Documentation on QUIC kernel Tx crypto.
  2022-08-06  3:05     ` Bagas Sanjaya
@ 2022-08-08 19:05       ` Adel Abouchaev
  0 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-08 19:05 UTC (permalink / raw)
  To: Bagas Sanjaya
  Cc: kuba, davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest, kernel test robot

Updated the commit message, will be visible in v3.

On 8/5/22 8:05 PM, Bagas Sanjaya wrote:
> On Fri, Aug 05, 2022 at 05:11:48PM -0700, Adel Abouchaev wrote:
>> Adding Documentation/networking/quic.rst file to describe kernel QUIC
>> code.
>>
> Better say "Add documentation for kernel QUIC code".
>
>> diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
>> index 03b215bddde8..656fa1dac26b 100644
>> --- a/Documentation/networking/index.rst
>> +++ b/Documentation/networking/index.rst
>> @@ -90,6 +90,7 @@ Contents:
>>      plip
>>      ppp_generic
>>      proc_net_tcp
>> +   quic
>>      radiotap-headers
>>      rds
>>      regulatory
>> diff --git a/Documentation/networking/quic.rst b/Documentation/networking/quic.rst
>> new file mode 100644
>> index 000000000000..416099b80e60
>> --- /dev/null
>> +++ b/Documentation/networking/quic.rst
>> @@ -0,0 +1,186 @@
>> +.. _kernel_quic:
>> +
>> +===========
>> +KERNEL QUIC
>> +===========
>> +
>> +Overview
>> +========
>> +
>> +QUIC is a secure general-purpose transport protocol that creates a stateful
>> +interaction between a client and a server. QUIC provides end-to-end integrity
>> +and confidentiality. Refer to RFC 9000 for more information on QUIC.
>> +
>> +The kernel Tx side offload covers the encryption of the application streams
>> +in the kernel rather than in the application. These packets are 1RTT packets
>> +in QUIC connection. Encryption of every other packets is still done by the
>> +QUIC library in user space.
>> +
>> +
>> +
>> +User Interface
>> +==============
>> +
>> +Creating a QUIC connection
>> +--------------------------
>> +
>> +QUIC connection originates and terminates in the application, using one of many
>> +available QUIC libraries. The code instantiates QUIC client and QUIC server in
>> +some form and configures them to use certain addresses and ports for the
>> +source and destination. The client and server negotiate the set of keys to
>> +protect the communication during different phases of the connection, maintain
>> +the connection and perform congestion control.
>> +
>> +Requesting to add QUIC Tx kernel encryption to the connection
>> +-------------------------------------------------------------
>> +
>> +Each flow that should be encrypted by the kernel needs to be registered with
>> +the kernel using socket API. A setsockopt() call on the socket creates an
>> +association between the QUIC connection ID of the flow with the encryption
>> +parameters for the crypto operations:
>> +
>> +.. code-block:: c
>> +
>> +	struct quic_connection_info conn_info;
>> +	char conn_id[5] = {0x01, 0x02, 0x03, 0x04, 0x05};
>> +	const size_t conn_id_len = sizeof(conn_id);
>> +	char conn_key[16] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
>> +			     0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f};
>> +	char conn_iv[12] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
>> +			    0x08, 0x09, 0x0a, 0x0b};
>> +	char conn_hdr_key[16] = {0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
>> +				 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f
>> +				};
>> +
>> +	conn_info.cipher_type = TLS_CIPHER_AES_GCM_128;
>> +
>> +	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
>> +	conn_info.key.conn_id_length = 5;
>> +	memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE
>> +				      - conn_id_len],
>> +	       &conn_id, conn_id_len);
>> +
>> +	memcpy(&conn_info.payload_key, conn_key, sizeof(conn_key));
>> +	memcpy(&conn_info.payload_iv, conn_iv, sizeof(conn_iv));
>> +	memcpy(&conn_info.header_key, conn_hdr_key, sizeof(conn_hdr_key));
>> +
>> +	setsockopt(fd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION, &conn_info,
>> +		   sizeof(conn_info));
>> +
>> +
>> +Requesting to remove QUIC Tx kernel crypto offload control messages
>> +-------------------------------------------------------------------
>> +
>> +All flows are removed when the socket is closed. To request an explicit remove
>> +of the offload for the connection during the lifetime of the socket the process
>> +is similar to adding the flow. Only the connection ID and its length are
>> +necessary to supply to remove the connection from the offload:
>> +
>> +.. code-block:: c
>> +
>> +	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
>> +	conn_info.key.conn_id_length = 5;
>> +	memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE
>> +				      - conn_id_len],
>> +	       &conn_id, conn_id_len);
>> +	setsockopt(fd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION, &conn_info,
>> +		   sizeof(conn_info));
>> +
>> +Sending QUIC application data
>> +-----------------------------
>> +
>> +For QUIC Tx encryption offload, the application should use sendmsg() socket
>> +call and provide ancillary data with information on connection ID length and
>> +offload flags for the kernel to perform the encryption and GSO support if
>> +requested.
>> +
>> +.. code-block:: c
>> +
>> +	size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
>> +	uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)];
>> +	struct quic_tx_ancillary_data * anc_data;
>> +	size_t quic_data_len = 4500;
>> +	struct cmsghdr * cmsg_hdr;
>> +	char quic_data[9000];
>> +	struct iovec iov[2];
>> +	int send_len = 9000;
>> +	struct msghdr msg;
>> +	int err;
>> +
>> +	iov[0].iov_base = quic_data;
>> +	iov[0].iov_len = quic_data_len;
>> +	iov[1].iov_base = quic_data + 4500;
>> +	iov[1].iov_len = quic_data_len;
>> +
>> +	if (client.addr.sin_family == AF_INET) {
>> +		msg.msg_name = &client.addr;
>> +		msg.msg_namelen = sizeof(client.addr);
>> +	} else {
>> +		msg.msg_name = &client.addr6;
>> +		msg.msg_namelen = sizeof(client.addr6);
>> +	}
>> +
>> +	msg.msg_iov = iov;
>> +	msg.msg_iovlen = 2;
>> +	msg.msg_control = cmsg_buf;
>> +	msg.msg_controllen = sizeof(cmsg_buf);
>> +	cmsg_hdr = CMSG_FIRSTHDR(&msg);
>> +	cmsg_hdr->cmsg_level = IPPROTO_UDP;
>> +	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
>> +	cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
>> +	anc_data = CMSG_DATA(cmsg_hdr);
>> +	anc_data->flags = 0;
>> +	anc_data->next_pkt_num = 0x0d65c9;
>> +	anc_data->conn_id_length = conn_id_len;
>> +	err = sendmsg(self->sfd, &msg, 0);
>> +
>> +QUIC Tx offload in kernel will read the data from userspace, encrypt and
>> +copy it to the ciphertext within the same operation.
>> +
>> +
>> +Sending QUIC application data with GSO
>> +--------------------------------------
>> +When GSO is in use, the kernel will use the GSO fragment size as the target
>> +for ciphertext. The packets from the user space should align on the boundary
>> +of GSO fragment size minus the size of the tag for the chosen cipher. For the
>> +GSO fragment 1200, the plain packets should follow each other at every 1184
>> +bytes, given the tag size of 16. After the encryption, the rest of the UDP
>> +and IP stacks will follow the defined value of GSO fragment which will include
>> +the trailing tag bytes.
>> +
>> +To set up GSO fragmentation:
>> +
>> +.. code-block:: c
>> +
>> +	setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
>> +		   sizeof(frag_size));
>> +
>> +If the GSO fragment size is provided in ancillary data within the sendmsg()
>> +call, the value in ancillary data will take precedence over the segment size
>> +provided in setsockopt to split the payload into packets. This is consistent
>> +with the UDP stack behavior.
>> +
>> +Integrating to userspace QUIC libraries
>> +---------------------------------------
>> +
>> +Userspace QUIC libraries integration would depend on the implementation of the
>> +QUIC protocol. For MVFST library, the control plane is integrated into the
>> +handshake callbacks to properly configure the flows into the socket; and the
>> +data plane is integrated into the methods that perform encryption and send
>> +the packets to the batch scheduler for transmissions to the socket.
>> +
>> +MVFST library can be found at https://github.com/facebookincubator/mvfst.
>> +
>> +Statistics
>> +==========
>> +
>> +QUIC Tx offload to the kernel has counters
>> +(``/proc/net/quic_stat``):
>> +
>> +- ``QuicCurrTxSw`` -
>> +  number of currently active kernel offloaded QUIC connections
>> +- ``QuicTxSw`` -
>> +  accumulative total number of offloaded QUIC connections
>> +- ``QuicTxSwError`` -
>> +  accumulative total number of errors during QUIC Tx offload to kernel
>> +
> The documentation looks OK (no new warnings).
>
> Thanks.
>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [net-next 0/6] net: support QUIC crypto
       [not found] <Adel Abouchaev <adel.abushaev@gmail.com>
                   ` (2 preceding siblings ...)
  2022-08-06  0:11 ` [RFC net-next v2 0/6] net: support QUIC crypto Adel Abouchaev
@ 2022-08-16 18:11 ` Adel Abouchaev
  2022-08-16 18:11   ` [net-next 1/6] Documentation on QUIC kernel Tx crypto Adel Abouchaev
                     ` (6 more replies)
  2022-08-17 20:09 ` [net-next v2 " Adel Abouchaev
                   ` (3 subsequent siblings)
  7 siblings, 7 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-16 18:11 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

QUIC requires end to end encryption of the data. The application usually
prepares the data in clear text, encrypts and calls send() which implies
multiple copies of the data before the packets hit the networking stack.
Similar to kTLS, QUIC kernel offload of cryptography reduces the memory
pressure by reducing the number of copies.

The scope of kernel support is limited to the symmetric cryptography,
leaving the handshake to the user space library. For QUIC in particular,
the application packets that require symmetric cryptography are the 1RTT
packets with short headers. Kernel will encrypt the application packets
on transmission and decrypt on receive. This series implements Tx only,
because in QUIC server applications Tx outweighs Rx by orders of
magnitude.

Supporting the combination of QUIC and GSO requires the application to
correctly place the data and the kernel to correctly slice it. The
encryption process appends an arbitrary number of bytes (tag) to the end
of the message to authenticate it. The GSO value should include this
overhead, the offload would then subtract the tag size to parse the
input on Tx before chunking and encrypting it.

With the kernel cryptography, the buffer copy operation is conjoined
with the encryption operation. The memory bandwidth is reduced by 5-8%.
When devices supporting QUIC encryption in hardware come to the market,
we will be able to free further 7% of CPU utilization which is used
today for crypto operations.

*** BLURB HERE ***

Adel Abouchaev (6):
  Documentation on QUIC kernel Tx crypto.
  Define QUIC specific constants, control and data plane structures
  Add UDP ULP operations, initialization and handling prototype
    functions.
  Implement QUIC offload functions
  Add flow counters and Tx processing error counter
  Add self tests for ULP operations, flow setup and crypto tests

 Documentation/networking/index.rst     |    1 +
 Documentation/networking/quic.rst      |  186 ++++
 include/net/inet_sock.h                |    2 +
 include/net/netns/mib.h                |    3 +
 include/net/quic.h                     |   63 ++
 include/net/snmp.h                     |    6 +
 include/net/udp.h                      |   33 +
 include/uapi/linux/quic.h              |   61 +
 include/uapi/linux/snmp.h              |    9 +
 include/uapi/linux/udp.h               |    4 +
 net/Kconfig                            |    1 +
 net/Makefile                           |    1 +
 net/ipv4/Makefile                      |    3 +-
 net/ipv4/udp.c                         |   15 +
 net/ipv4/udp_ulp.c                     |  192 ++++
 net/quic/Kconfig                       |   16 +
 net/quic/Makefile                      |    8 +
 net/quic/quic_main.c                   | 1417 ++++++++++++++++++++++++
 net/quic/quic_proc.c                   |   45 +
 tools/testing/selftests/net/.gitignore |    3 +-
 tools/testing/selftests/net/Makefile   |    3 +-
 tools/testing/selftests/net/quic.c     | 1153 +++++++++++++++++++
 tools/testing/selftests/net/quic.sh    |   46 +
 23 files changed, 3268 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/networking/quic.rst
 create mode 100644 include/net/quic.h
 create mode 100644 include/uapi/linux/quic.h
 create mode 100644 net/ipv4/udp_ulp.c
 create mode 100644 net/quic/Kconfig
 create mode 100644 net/quic/Makefile
 create mode 100644 net/quic/quic_main.c
 create mode 100644 net/quic/quic_proc.c
 create mode 100644 tools/testing/selftests/net/quic.c
 create mode 100755 tools/testing/selftests/net/quic.sh

--
2.30.2


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [net-next 1/6] Documentation on QUIC kernel Tx crypto.
  2022-08-16 18:11 ` [net-next 0/6] net: support QUIC crypto Adel Abouchaev
@ 2022-08-16 18:11   ` Adel Abouchaev
  2022-08-16 18:11   ` [net-next 2/6] Define QUIC specific constants, control and data plane structures Adel Abouchaev
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-16 18:11 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest, kernel test robot

Add documentation for kernel QUIC code.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>

---

Added quic.rst reference to the index.rst file; identation in
quic.rst file.
Reported-by: kernel test robot <lkp@intel.com>

Added SPDX license GPL 2.0.
---
 Documentation/networking/index.rst |   1 +
 Documentation/networking/quic.rst  | 186 +++++++++++++++++++++++++++++
 2 files changed, 187 insertions(+)
 create mode 100644 Documentation/networking/quic.rst

diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index 03b215bddde8..656fa1dac26b 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -90,6 +90,7 @@ Contents:
    plip
    ppp_generic
    proc_net_tcp
+   quic
    radiotap-headers
    rds
    regulatory
diff --git a/Documentation/networking/quic.rst b/Documentation/networking/quic.rst
new file mode 100644
index 000000000000..127802d27a42
--- /dev/null
+++ b/Documentation/networking/quic.rst
@@ -0,0 +1,186 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========
+KERNEL QUIC
+===========
+
+Overview
+========
+
+QUIC is a secure general-purpose transport protocol that creates a stateful
+interaction between a client and a server. QUIC provides end-to-end integrity
+and confidentiality. Refer to RFC 9000 for more information on QUIC.
+
+The kernel Tx side offload covers the encryption of the application streams
+in the kernel rather than in the application. These packets are 1RTT packets
+in QUIC connection. Encryption of every other packets is still done by the
+QUIC library in user space.
+
+
+
+User Interface
+==============
+
+Creating a QUIC connection
+--------------------------
+
+QUIC connection originates and terminates in the application, using one of many
+available QUIC libraries. The code instantiates QUIC client and QUIC server in
+some form and configures them to use certain addresses and ports for the
+source and destination. The client and server negotiate the set of keys to
+protect the communication during different phases of the connection, maintain
+the connection and perform congestion control.
+
+Requesting to add QUIC Tx kernel encryption to the connection
+-------------------------------------------------------------
+
+Each flow that should be encrypted by the kernel needs to be registered with
+the kernel using socket API. A setsockopt() call on the socket creates an
+association between the QUIC connection ID of the flow with the encryption
+parameters for the crypto operations:
+
+.. code-block:: c
+
+	struct quic_connection_info conn_info;
+	char conn_id[5] = {0x01, 0x02, 0x03, 0x04, 0x05};
+	const size_t conn_id_len = sizeof(conn_id);
+	char conn_key[16] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+			     0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f};
+	char conn_iv[12] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+			    0x08, 0x09, 0x0a, 0x0b};
+	char conn_hdr_key[16] = {0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
+				 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f
+				};
+
+	conn_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+
+	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_info.key.conn_id_length = 5;
+	memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE
+				      - conn_id_len],
+	       &conn_id, conn_id_len);
+
+	memcpy(&conn_info.payload_key, conn_key, sizeof(conn_key));
+	memcpy(&conn_info.payload_iv, conn_iv, sizeof(conn_iv));
+	memcpy(&conn_info.header_key, conn_hdr_key, sizeof(conn_hdr_key));
+
+	setsockopt(fd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION, &conn_info,
+		   sizeof(conn_info));
+
+
+Requesting to remove QUIC Tx kernel crypto offload control messages
+-------------------------------------------------------------------
+
+All flows are removed when the socket is closed. To request an explicit remove
+of the offload for the connection during the lifetime of the socket the process
+is similar to adding the flow. Only the connection ID and its length are
+necessary to supply to remove the connection from the offload:
+
+.. code-block:: c
+
+	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_info.key.conn_id_length = 5;
+	memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE
+				      - conn_id_len],
+	       &conn_id, conn_id_len);
+	setsockopt(fd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION, &conn_info,
+		   sizeof(conn_info));
+
+Sending QUIC application data
+-----------------------------
+
+For QUIC Tx encryption offload, the application should use sendmsg() socket
+call and provide ancillary data with information on connection ID length and
+offload flags for the kernel to perform the encryption and GSO support if
+requested.
+
+.. code-block:: c
+
+	size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
+	uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)];
+	struct quic_tx_ancillary_data * anc_data;
+	size_t quic_data_len = 4500;
+	struct cmsghdr * cmsg_hdr;
+	char quic_data[9000];
+	struct iovec iov[2];
+	int send_len = 9000;
+	struct msghdr msg;
+	int err;
+
+	iov[0].iov_base = quic_data;
+	iov[0].iov_len = quic_data_len;
+	iov[1].iov_base = quic_data + 4500;
+	iov[1].iov_len = quic_data_len;
+
+	if (client.addr.sin_family == AF_INET) {
+		msg.msg_name = &client.addr;
+		msg.msg_namelen = sizeof(client.addr);
+	} else {
+		msg.msg_name = &client.addr6;
+		msg.msg_namelen = sizeof(client.addr6);
+	}
+
+	msg.msg_iov = iov;
+	msg.msg_iovlen = 2;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
+	anc_data = CMSG_DATA(cmsg_hdr);
+	anc_data->flags = 0;
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->conn_id_length = conn_id_len;
+	err = sendmsg(self->sfd, &msg, 0);
+
+QUIC Tx offload in kernel will read the data from userspace, encrypt and
+copy it to the ciphertext within the same operation.
+
+
+Sending QUIC application data with GSO
+--------------------------------------
+When GSO is in use, the kernel will use the GSO fragment size as the target
+for ciphertext. The packets from the user space should align on the boundary
+of GSO fragment size minus the size of the tag for the chosen cipher. For the
+GSO fragment 1200, the plain packets should follow each other at every 1184
+bytes, given the tag size of 16. After the encryption, the rest of the UDP
+and IP stacks will follow the defined value of GSO fragment which will include
+the trailing tag bytes.
+
+To set up GSO fragmentation:
+
+.. code-block:: c
+
+	setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
+		   sizeof(frag_size));
+
+If the GSO fragment size is provided in ancillary data within the sendmsg()
+call, the value in ancillary data will take precedence over the segment size
+provided in setsockopt to split the payload into packets. This is consistent
+with the UDP stack behavior.
+
+Integrating to userspace QUIC libraries
+---------------------------------------
+
+Userspace QUIC libraries integration would depend on the implementation of the
+QUIC protocol. For MVFST library, the control plane is integrated into the
+handshake callbacks to properly configure the flows into the socket; and the
+data plane is integrated into the methods that perform encryption and send
+the packets to the batch scheduler for transmissions to the socket.
+
+MVFST library can be found at https://github.com/facebookincubator/mvfst.
+
+Statistics
+==========
+
+QUIC Tx offload to the kernel has counters
+(``/proc/net/quic_stat``):
+
+- ``QuicCurrTxSw`` -
+  number of currently active kernel offloaded QUIC connections
+- ``QuicTxSw`` -
+  accumulative total number of offloaded QUIC connections
+- ``QuicTxSwError`` -
+  accumulative total number of errors during QUIC Tx offload to kernel
+
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [net-next 2/6] Define QUIC specific constants, control and data plane structures
  2022-08-16 18:11 ` [net-next 0/6] net: support QUIC crypto Adel Abouchaev
  2022-08-16 18:11   ` [net-next 1/6] Documentation on QUIC kernel Tx crypto Adel Abouchaev
@ 2022-08-16 18:11   ` Adel Abouchaev
  2022-08-16 18:11   ` [net-next 3/6] Add UDP ULP operations, initialization and handling prototype functions Adel Abouchaev
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-16 18:11 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

Define control and data plane structures to pass in control plane for
flow add/remove and during packet send within ancillary data. Define
constants to use within SOL_UDP to program QUIC sockets.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>
---
 include/uapi/linux/quic.h | 61 +++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/udp.h  |  3 ++
 2 files changed, 64 insertions(+)
 create mode 100644 include/uapi/linux/quic.h

diff --git a/include/uapi/linux/quic.h b/include/uapi/linux/quic.h
new file mode 100644
index 000000000000..79680b8b18a6
--- /dev/null
+++ b/include/uapi/linux/quic.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) */
+
+#ifndef _UAPI_LINUX_QUIC_H
+#define _UAPI_LINUX_QUIC_H
+
+#include <linux/types.h>
+#include <linux/tls.h>
+
+#define QUIC_MAX_CONNECTION_ID_SIZE 20
+
+/* Side by side data for QUIC egress operations */
+#define QUIC_BYPASS_ENCRYPTION 0x01
+
+struct quic_tx_ancillary_data {
+	__aligned_u64	next_pkt_num;
+	__u8	flags;
+	__u8	conn_id_length;
+};
+
+struct quic_connection_info_key {
+	__u8	conn_id[QUIC_MAX_CONNECTION_ID_SIZE];
+	__u8	conn_id_length;
+};
+
+struct quic_aes_gcm_128 {
+	__u8	header_key[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
+	__u8	payload_key[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
+	__u8	payload_iv[TLS_CIPHER_AES_GCM_128_IV_SIZE];
+};
+
+struct quic_aes_gcm_256 {
+	__u8	header_key[TLS_CIPHER_AES_GCM_256_KEY_SIZE];
+	__u8	payload_key[TLS_CIPHER_AES_GCM_256_KEY_SIZE];
+	__u8	payload_iv[TLS_CIPHER_AES_GCM_256_IV_SIZE];
+};
+
+struct quic_aes_ccm_128 {
+	__u8	header_key[TLS_CIPHER_AES_CCM_128_KEY_SIZE];
+	__u8	payload_key[TLS_CIPHER_AES_CCM_128_KEY_SIZE];
+	__u8	payload_iv[TLS_CIPHER_AES_CCM_128_IV_SIZE];
+};
+
+struct quic_chacha20_poly1305 {
+	__u8	header_key[TLS_CIPHER_CHACHA20_POLY1305_KEY_SIZE];
+	__u8	payload_key[TLS_CIPHER_CHACHA20_POLY1305_KEY_SIZE];
+	__u8	payload_iv[TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE];
+};
+
+struct quic_connection_info {
+	__u16	cipher_type;
+	struct quic_connection_info_key		key;
+	union {
+		struct quic_aes_gcm_128 aes_gcm_128;
+		struct quic_aes_gcm_256 aes_gcm_256;
+		struct quic_aes_ccm_128 aes_ccm_128;
+		struct quic_chacha20_poly1305 chacha20_poly1305;
+	};
+};
+
+#endif
+
diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
index 4828794efcf8..0ee4c598e70b 100644
--- a/include/uapi/linux/udp.h
+++ b/include/uapi/linux/udp.h
@@ -34,6 +34,9 @@ struct udphdr {
 #define UDP_NO_CHECK6_RX 102	/* Disable accpeting checksum for UDP6 */
 #define UDP_SEGMENT	103	/* Set GSO segmentation size */
 #define UDP_GRO		104	/* This socket can receive UDP GRO packets */
+#define UDP_QUIC_ADD_TX_CONNECTION	106 /* Add QUIC Tx crypto offload */
+#define UDP_QUIC_DEL_TX_CONNECTION	107 /* Del QUIC Tx crypto offload */
+#define UDP_QUIC_ENCRYPT		108 /* QUIC encryption parameters */
 
 /* UDP encapsulation types */
 #define UDP_ENCAP_ESPINUDP_NON_IKE	1 /* draft-ietf-ipsec-nat-t-ike-00/01 */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [net-next 3/6] Add UDP ULP operations, initialization and handling prototype functions.
  2022-08-16 18:11 ` [net-next 0/6] net: support QUIC crypto Adel Abouchaev
  2022-08-16 18:11   ` [net-next 1/6] Documentation on QUIC kernel Tx crypto Adel Abouchaev
  2022-08-16 18:11   ` [net-next 2/6] Define QUIC specific constants, control and data plane structures Adel Abouchaev
@ 2022-08-16 18:11   ` Adel Abouchaev
  2022-08-16 18:11   ` [net-next 4/6] Implement QUIC offload functions Adel Abouchaev
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-16 18:11 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

Define functions to add UDP ULP handling, registration with UDP protocol
and supporting data structures. Create structure for QUIC ULP and add empty
prototype functions to support it.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>

---

Removed reference to net/quic/Kconfig from this patch into the next.

Fixed formatting around brackets.
---
 include/net/inet_sock.h  |   2 +
 include/net/udp.h        |  33 +++++++
 include/uapi/linux/udp.h |   1 +
 net/Makefile             |   1 +
 net/ipv4/Makefile        |   3 +-
 net/ipv4/udp.c           |   6 ++
 net/ipv4/udp_ulp.c       | 192 +++++++++++++++++++++++++++++++++++++++
 7 files changed, 237 insertions(+), 1 deletion(-)
 create mode 100644 net/ipv4/udp_ulp.c

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index bf5654ce711e..650e332bdb50 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -249,6 +249,8 @@ struct inet_sock {
 	__be32			mc_addr;
 	struct ip_mc_socklist __rcu	*mc_list;
 	struct inet_cork_full	cork;
+	const struct udp_ulp_ops	*udp_ulp_ops;
+	void __rcu		*ulp_data;
 };
 
 #define IPCORK_OPT	1	/* ip-options has been held in ipcork.opt */
diff --git a/include/net/udp.h b/include/net/udp.h
index 5ee88ddf79c3..f22ebabbb186 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -523,4 +523,37 @@ struct proto *udp_bpf_get_proto(struct sock *sk, struct sk_psock *psock);
 int udp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore);
 #endif
 
+/*
+ * Interface for adding Upper Level Protocols over UDP
+ */
+
+#define UDP_ULP_NAME_MAX	16
+#define UDP_ULP_MAX		128
+
+struct udp_ulp_ops {
+	struct list_head	list;
+
+	/* initialize ulp */
+	int (*init)(struct sock *sk);
+	/* cleanup ulp */
+	void (*release)(struct sock *sk);
+
+	char		name[UDP_ULP_NAME_MAX];
+	struct module	*owner;
+};
+
+int udp_register_ulp(struct udp_ulp_ops *type);
+void udp_unregister_ulp(struct udp_ulp_ops *type);
+int udp_set_ulp(struct sock *sk, const char *name);
+void udp_get_available_ulp(char *buf, size_t len);
+void udp_cleanup_ulp(struct sock *sk);
+int udp_setsockopt_ulp(struct sock *sk, sockptr_t optval,
+		       unsigned int optlen);
+int udp_getsockopt_ulp(struct sock *sk, char __user *optval,
+		       int __user *optlen);
+
+#define MODULE_ALIAS_UDP_ULP(name)\
+	__MODULE_INFO(alias, alias_userspace, name);\
+	__MODULE_INFO(alias, alias_udp_ulp, "udp-ulp-" name)
+
 #endif	/* _UDP_H */
diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
index 0ee4c598e70b..893691f0108a 100644
--- a/include/uapi/linux/udp.h
+++ b/include/uapi/linux/udp.h
@@ -34,6 +34,7 @@ struct udphdr {
 #define UDP_NO_CHECK6_RX 102	/* Disable accpeting checksum for UDP6 */
 #define UDP_SEGMENT	103	/* Set GSO segmentation size */
 #define UDP_GRO		104	/* This socket can receive UDP GRO packets */
+#define UDP_ULP		105	/* Attach ULP to a UDP socket */
 #define UDP_QUIC_ADD_TX_CONNECTION	106 /* Add QUIC Tx crypto offload */
 #define UDP_QUIC_DEL_TX_CONNECTION	107 /* Del QUIC Tx crypto offload */
 #define UDP_QUIC_ENCRYPT		108 /* QUIC encryption parameters */
diff --git a/net/Makefile b/net/Makefile
index fbfeb8a0bb37..28565bfe29cb 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -16,6 +16,7 @@ obj-y				+= ethernet/ 802/ sched/ netlink/ bpf/ ethtool/
 obj-$(CONFIG_NETFILTER)		+= netfilter/
 obj-$(CONFIG_INET)		+= ipv4/
 obj-$(CONFIG_TLS)		+= tls/
+obj-$(CONFIG_QUIC)		+= quic/
 obj-$(CONFIG_XFRM)		+= xfrm/
 obj-$(CONFIG_UNIX_SCM)		+= unix/
 obj-y				+= ipv6/
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index bbdd9c44f14e..88d3baf4af95 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -14,7 +14,8 @@ obj-y     := route.o inetpeer.o protocol.o \
 	     udp_offload.o arp.o icmp.o devinet.o af_inet.o igmp.o \
 	     fib_frontend.o fib_semantics.o fib_trie.o fib_notifier.o \
 	     inet_fragment.o ping.o ip_tunnel_core.o gre_offload.o \
-	     metrics.o netlink.o nexthop.o udp_tunnel_stub.o
+	     metrics.o netlink.o nexthop.o udp_tunnel_stub.o \
+	     udp_ulp.o
 
 obj-$(CONFIG_BPFILTER) += bpfilter/
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 34eda973bbf1..027c4513a9cd 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2779,6 +2779,9 @@ int udp_lib_setsockopt(struct sock *sk, int level, int optname,
 		up->pcflag |= UDPLITE_RECV_CC;
 		break;
 
+	case UDP_ULP:
+		return udp_setsockopt_ulp(sk, optval, optlen);
+
 	default:
 		err = -ENOPROTOOPT;
 		break;
@@ -2847,6 +2850,9 @@ int udp_lib_getsockopt(struct sock *sk, int level, int optname,
 		val = up->pcrlen;
 		break;
 
+	case UDP_ULP:
+		return udp_getsockopt_ulp(sk, optval, optlen);
+
 	default:
 		return -ENOPROTOOPT;
 	}
diff --git a/net/ipv4/udp_ulp.c b/net/ipv4/udp_ulp.c
new file mode 100644
index 000000000000..138818690151
--- /dev/null
+++ b/net/ipv4/udp_ulp.c
@@ -0,0 +1,192 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Pluggable UDP upper layer protocol support, based on pluggable TCP upper
+ * layer protocol support.
+ *
+ * Copyright (c) 2016-2017, Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2016-2017, Dave Watson <davejwatson@fb.com>. All rights
+ * reserved.
+ * Copyright (c) 2021-2022, Meta Platforms, Inc. All rights reserved.
+ */
+
+#include <linux/gfp.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/mm.h>
+#include <linux/types.h>
+#include <linux/skmsg.h>
+#include <net/tcp.h>
+#include <net/udp.h>
+
+static DEFINE_SPINLOCK(udp_ulp_list_lock);
+static LIST_HEAD(udp_ulp_list);
+
+/* Simple linear search, don't expect many entries! */
+static struct udp_ulp_ops *udp_ulp_find(const char *name)
+{
+	struct udp_ulp_ops *e;
+
+	list_for_each_entry_rcu(e, &udp_ulp_list, list,
+				lockdep_is_held(&udp_ulp_list_lock)) {
+		if (strcmp(e->name, name) == 0)
+			return e;
+	}
+
+	return NULL;
+}
+
+static const struct udp_ulp_ops *__udp_ulp_find_autoload(const char *name)
+{
+	const struct udp_ulp_ops *ulp = NULL;
+
+	rcu_read_lock();
+	ulp = udp_ulp_find(name);
+
+#ifdef CONFIG_MODULES
+	if (!ulp && capable(CAP_NET_ADMIN)) {
+		rcu_read_unlock();
+		request_module("udp-ulp-%s", name);
+		rcu_read_lock();
+		ulp = udp_ulp_find(name);
+	}
+#endif
+	if (!ulp || !try_module_get(ulp->owner))
+		ulp = NULL;
+
+	rcu_read_unlock();
+	return ulp;
+}
+
+/* Attach new upper layer protocol to the list
+ * of available protocols.
+ */
+int udp_register_ulp(struct udp_ulp_ops *ulp)
+{
+	int ret = 0;
+
+	spin_lock(&udp_ulp_list_lock);
+	if (udp_ulp_find(ulp->name))
+		ret = -EEXIST;
+	else
+		list_add_tail_rcu(&ulp->list, &udp_ulp_list);
+
+	spin_unlock(&udp_ulp_list_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(udp_register_ulp);
+
+void udp_unregister_ulp(struct udp_ulp_ops *ulp)
+{
+	spin_lock(&udp_ulp_list_lock);
+	list_del_rcu(&ulp->list);
+	spin_unlock(&udp_ulp_list_lock);
+
+	synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(udp_unregister_ulp);
+
+void udp_cleanup_ulp(struct sock *sk)
+{
+	struct inet_sock *inet = inet_sk(sk);
+
+	/* No sock_owned_by_me() check here as at the time the
+	 * stack calls this function, the socket is dead and
+	 * about to be destroyed.
+	 */
+	if (!inet->udp_ulp_ops)
+		return;
+
+	if (inet->udp_ulp_ops->release)
+		inet->udp_ulp_ops->release(sk);
+	module_put(inet->udp_ulp_ops->owner);
+
+	inet->udp_ulp_ops = NULL;
+}
+
+static int __udp_set_ulp(struct sock *sk, const struct udp_ulp_ops *ulp_ops)
+{
+	struct inet_sock *inet = inet_sk(sk);
+	int err;
+
+	err = -EEXIST;
+	if (inet->udp_ulp_ops)
+		goto out_err;
+
+	err = ulp_ops->init(sk);
+	if (err)
+		goto out_err;
+
+	inet->udp_ulp_ops = ulp_ops;
+	return 0;
+
+out_err:
+	module_put(ulp_ops->owner);
+	return err;
+}
+
+int udp_set_ulp(struct sock *sk, const char *name)
+{
+	struct sk_psock *psock = sk_psock_get(sk);
+	const struct udp_ulp_ops *ulp_ops;
+
+	if (psock) {
+		sk_psock_put(sk, psock);
+		return -EINVAL;
+	}
+
+	sock_owned_by_me(sk);
+	ulp_ops = __udp_ulp_find_autoload(name);
+	if (!ulp_ops)
+		return -ENOENT;
+
+	return __udp_set_ulp(sk, ulp_ops);
+}
+
+int udp_setsockopt_ulp(struct sock *sk, sockptr_t optval, unsigned int optlen)
+{
+	char name[UDP_ULP_NAME_MAX];
+	int val, err;
+
+	if (!optlen || optlen > UDP_ULP_NAME_MAX)
+		return -EINVAL;
+
+	val = strncpy_from_sockptr(name, optval, optlen);
+	if (val < 0)
+		return -EFAULT;
+
+	if (val == UDP_ULP_NAME_MAX)
+		return -EINVAL;
+
+	name[val] = 0;
+	lock_sock(sk);
+	err = udp_set_ulp(sk, name);
+	release_sock(sk);
+	return err;
+}
+
+int udp_getsockopt_ulp(struct sock *sk, char __user *optval, int __user *optlen)
+{
+	struct inet_sock *inet = inet_sk(sk);
+	int len;
+
+	if (get_user(len, optlen))
+		return -EFAULT;
+
+	len = min_t(unsigned int, len, UDP_ULP_NAME_MAX);
+	if (len < 0)
+		return -EINVAL;
+
+	if (!inet->udp_ulp_ops) {
+		if (put_user(0, optlen))
+			return -EFAULT;
+		return 0;
+	}
+
+	if (put_user(len, optlen))
+		return -EFAULT;
+	if (copy_to_user(optval, inet->udp_ulp_ops->name, len))
+		return -EFAULT;
+
+	return 0;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [net-next 4/6] Implement QUIC offload functions
  2022-08-16 18:11 ` [net-next 0/6] net: support QUIC crypto Adel Abouchaev
                     ` (2 preceding siblings ...)
  2022-08-16 18:11   ` [net-next 3/6] Add UDP ULP operations, initialization and handling prototype functions Adel Abouchaev
@ 2022-08-16 18:11   ` Adel Abouchaev
  2022-08-16 18:11   ` [net-next 5/6] Add flow counters and Tx processing error counter Adel Abouchaev
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-16 18:11 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

Add connection hash to the context do support add, remove operations
on QUIC connections for the control plane and lookup for the data
plane. Implement setsockopt and add placeholders to add and delete Tx
connections.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>

---

Added net/quic/Kconfig reference to net/Kconfig in this commit.

Initialized pointers with NULL vs 0. Restricted AES counter to __le32
Added address space qualifiers to user space addresses. Removed empty
lines. Updated code alignment. Removed inlines.

v3: removed ITER_KVEC flag from iov_iter_kvec call.
v3: fixed Chacha20 encryption bug.
---
 include/net/quic.h   |   53 ++
 net/Kconfig          |    1 +
 net/ipv4/udp.c       |    9 +
 net/quic/Kconfig     |   16 +
 net/quic/Makefile    |    8 +
 net/quic/quic_main.c | 1371 ++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 1458 insertions(+)
 create mode 100644 include/net/quic.h
 create mode 100644 net/quic/Kconfig
 create mode 100644 net/quic/Makefile
 create mode 100644 net/quic/quic_main.c

diff --git a/include/net/quic.h b/include/net/quic.h
new file mode 100644
index 000000000000..cafe01174e60
--- /dev/null
+++ b/include/net/quic.h
@@ -0,0 +1,53 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifndef INCLUDE_NET_QUIC_H
+#define INCLUDE_NET_QUIC_H
+
+#include <linux/mutex.h>
+#include <linux/rhashtable.h>
+#include <linux/skmsg.h>
+#include <uapi/linux/quic.h>
+
+#define QUIC_MAX_SHORT_HEADER_SIZE      25
+#define QUIC_MAX_CONNECTION_ID_SIZE     20
+#define QUIC_HDR_MASK_SIZE              16
+#define QUIC_MAX_GSO_FRAGS              16
+
+// Maximum IV and nonce sizes should be in sync with supported ciphers.
+#define QUIC_CIPHER_MAX_IV_SIZE		12
+#define QUIC_CIPHER_MAX_NONCE_SIZE	16
+
+/* Side by side data for QUIC egress operations */
+#define QUIC_ANCILLARY_FLAGS    (QUIC_BYPASS_ENCRYPTION)
+
+#define QUIC_MAX_IOVEC_SEGMENTS		8
+#define QUIC_MAX_SG_ALLOC_ELEMENTS	32
+#define QUIC_MAX_PLAIN_PAGES		16
+#define QUIC_MAX_CIPHER_PAGES_ORDER	4
+
+struct quic_internal_crypto_context {
+	struct quic_connection_info	conn_info;
+	struct crypto_skcipher		*header_tfm;
+	struct crypto_aead		*packet_aead;
+};
+
+struct quic_connection_rhash {
+	struct rhash_head			node;
+	struct quic_internal_crypto_context	crypto_ctx;
+	struct rcu_head				rcu;
+};
+
+struct quic_context {
+	struct proto		*sk_proto;
+	struct rhashtable	tx_connections;
+	struct scatterlist	sg_alloc[QUIC_MAX_SG_ALLOC_ELEMENTS];
+	struct page		*cipher_page;
+	/**
+	 * To synchronize concurrent sendmsg() requests through the same socket
+	 * and protect preallocated per-context memory.
+	 **/
+	struct mutex		sendmsg_mux;
+	struct rcu_head		rcu;
+};
+
+#endif
diff --git a/net/Kconfig b/net/Kconfig
index 6b78f695caa6..93e3b1308aec 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -63,6 +63,7 @@ menu "Networking options"
 source "net/packet/Kconfig"
 source "net/unix/Kconfig"
 source "net/tls/Kconfig"
+source "net/quic/Kconfig"
 source "net/xfrm/Kconfig"
 source "net/iucv/Kconfig"
 source "net/smc/Kconfig"
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 027c4513a9cd..e7cbbea9d8d9 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -113,6 +113,7 @@
 #include <net/sock_reuseport.h>
 #include <net/addrconf.h>
 #include <net/udp_tunnel.h>
+#include <uapi/linux/quic.h>
 #if IS_ENABLED(CONFIG_IPV6)
 #include <net/ipv6_stubs.h>
 #endif
@@ -1011,6 +1012,14 @@ static int __udp_cmsg_send(struct cmsghdr *cmsg, u16 *gso_size)
 			return -EINVAL;
 		*gso_size = *(__u16 *)CMSG_DATA(cmsg);
 		return 0;
+	case UDP_QUIC_ENCRYPT:
+		/* This option is handled in UDP_ULP and is only checked
+		 * here for the bypass bit
+		 */
+		if (cmsg->cmsg_len !=
+		    CMSG_LEN(sizeof(struct quic_tx_ancillary_data)))
+			return -EINVAL;
+		return 0;
 	default:
 		return -EINVAL;
 	}
diff --git a/net/quic/Kconfig b/net/quic/Kconfig
new file mode 100644
index 000000000000..661cb989508a
--- /dev/null
+++ b/net/quic/Kconfig
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# QUIC configuration
+#
+config QUIC
+	tristate "QUIC encryption offload"
+	depends on INET
+	select CRYPTO
+	select CRYPTO_AES
+	select CRYPTO_GCM
+	help
+	Enable kernel support for QUIC crypto offload. Currently only TX
+	encryption offload is supported. The kernel will perform
+	copy-during-encryption.
+
+	If unsure, say N.
diff --git a/net/quic/Makefile b/net/quic/Makefile
new file mode 100644
index 000000000000..928239c4d08c
--- /dev/null
+++ b/net/quic/Makefile
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for the QUIC subsystem
+#
+
+obj-$(CONFIG_QUIC) += quic.o
+
+quic-y := quic_main.o
diff --git a/net/quic/quic_main.c b/net/quic/quic_main.c
new file mode 100644
index 000000000000..95de3a961479
--- /dev/null
+++ b/net/quic/quic_main.c
@@ -0,0 +1,1371 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <crypto/skcipher.h>
+#include <linux/bug.h>
+#include <linux/module.h>
+#include <linux/rhashtable.h>
+// Include header to use TLS constants for AEAD cipher.
+#include <net/tls.h>
+#include <net/quic.h>
+#include <net/udp.h>
+#include <uapi/linux/quic.h>
+
+static unsigned long af_init_done;
+static struct proto quic_v4_proto;
+static struct proto quic_v6_proto;
+static DEFINE_SPINLOCK(quic_proto_lock);
+
+static u32 quic_tx_connection_hash(const void *data, u32 len, u32 seed)
+{
+	return jhash(data, len, seed);
+}
+
+static u32 quic_tx_connection_hash_obj(const void *data, u32 len, u32 seed)
+{
+	const struct quic_connection_rhash *connhash = data;
+
+	return jhash(&connhash->crypto_ctx.conn_info.key,
+		     sizeof(struct quic_connection_info_key), seed);
+}
+
+static int quic_tx_connection_hash_cmp(struct rhashtable_compare_arg *arg,
+				       const void *ptr)
+{
+	const struct quic_connection_info_key *key = arg->key;
+	const struct quic_connection_rhash *x = ptr;
+
+	return !!memcmp(&x->crypto_ctx.conn_info.key,
+			key,
+			sizeof(struct quic_connection_info_key));
+}
+
+static const struct rhashtable_params quic_tx_connection_params = {
+	.key_len		= sizeof(struct quic_connection_info_key),
+	.head_offset		= offsetof(struct quic_connection_rhash, node),
+	.hashfn			= quic_tx_connection_hash,
+	.obj_hashfn		= quic_tx_connection_hash_obj,
+	.obj_cmpfn		= quic_tx_connection_hash_cmp,
+	.automatic_shrinking	= true,
+};
+
+static size_t quic_crypto_key_size(u16 cipher_type)
+{
+	switch (cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		return TLS_CIPHER_AES_GCM_128_KEY_SIZE;
+	case TLS_CIPHER_AES_GCM_256:
+		return TLS_CIPHER_AES_GCM_256_KEY_SIZE;
+	case TLS_CIPHER_AES_CCM_128:
+		return TLS_CIPHER_AES_CCM_128_KEY_SIZE;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		return TLS_CIPHER_CHACHA20_POLY1305_KEY_SIZE;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return 0;
+}
+
+static size_t quic_crypto_tag_size(u16 cipher_type)
+{
+	switch (cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		return TLS_CIPHER_AES_GCM_128_TAG_SIZE;
+	case TLS_CIPHER_AES_GCM_256:
+		return TLS_CIPHER_AES_GCM_256_TAG_SIZE;
+	case TLS_CIPHER_AES_CCM_128:
+		return TLS_CIPHER_AES_CCM_128_TAG_SIZE;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		return TLS_CIPHER_CHACHA20_POLY1305_TAG_SIZE;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return 0;
+}
+
+static size_t quic_crypto_nonce_size(u16 cipher_type)
+{
+	switch (cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		BUILD_BUG_ON(TLS_CIPHER_AES_GCM_128_IV_SIZE +
+			     TLS_CIPHER_AES_GCM_128_SALT_SIZE >
+			     QUIC_CIPHER_MAX_NONCE_SIZE);
+		return TLS_CIPHER_AES_GCM_128_IV_SIZE +
+		       TLS_CIPHER_AES_GCM_128_SALT_SIZE;
+	case TLS_CIPHER_AES_GCM_256:
+		BUILD_BUG_ON(TLS_CIPHER_AES_GCM_256_IV_SIZE +
+			     TLS_CIPHER_AES_GCM_256_SALT_SIZE >
+			     QUIC_CIPHER_MAX_NONCE_SIZE);
+		return TLS_CIPHER_AES_GCM_256_IV_SIZE +
+		       TLS_CIPHER_AES_GCM_256_SALT_SIZE;
+	case TLS_CIPHER_AES_CCM_128:
+		BUILD_BUG_ON(TLS_CIPHER_AES_CCM_128_IV_SIZE +
+			     TLS_CIPHER_AES_CCM_128_SALT_SIZE >
+			     QUIC_CIPHER_MAX_NONCE_SIZE);
+		return TLS_CIPHER_AES_CCM_128_IV_SIZE +
+		       TLS_CIPHER_AES_CCM_128_SALT_SIZE;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		BUILD_BUG_ON(TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE +
+			     TLS_CIPHER_CHACHA20_POLY1305_SALT_SIZE >
+			     QUIC_CIPHER_MAX_NONCE_SIZE);
+		return TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE +
+		       TLS_CIPHER_CHACHA20_POLY1305_SALT_SIZE;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return 0;
+}
+
+static u8 *quic_payload_iv(struct quic_internal_crypto_context *crypto_ctx)
+{
+	switch (crypto_ctx->conn_info.cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		return crypto_ctx->conn_info.aes_gcm_128.payload_iv;
+	case TLS_CIPHER_AES_GCM_256:
+		return crypto_ctx->conn_info.aes_gcm_256.payload_iv;
+	case TLS_CIPHER_AES_CCM_128:
+		return crypto_ctx->conn_info.aes_ccm_128.payload_iv;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		return crypto_ctx->conn_info.chacha20_poly1305.payload_iv;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return NULL;
+}
+
+static int
+quic_config_header_crypto(struct quic_internal_crypto_context *crypto_ctx)
+{
+	struct crypto_skcipher *tfm;
+	char *header_cipher;
+	int rc = 0;
+	char *key;
+
+	switch (crypto_ctx->conn_info.cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		header_cipher = "ecb(aes)";
+		key = crypto_ctx->conn_info.aes_gcm_128.header_key;
+		break;
+	case TLS_CIPHER_AES_GCM_256:
+		header_cipher = "ecb(aes)";
+		key = crypto_ctx->conn_info.aes_gcm_256.header_key;
+		break;
+	case TLS_CIPHER_AES_CCM_128:
+		header_cipher = "ecb(aes)";
+		key = crypto_ctx->conn_info.aes_ccm_128.header_key;
+		break;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		header_cipher = "chacha20";
+		key = crypto_ctx->conn_info.chacha20_poly1305.header_key;
+		break;
+	default:
+		rc = -EINVAL;
+		goto out;
+	}
+
+	tfm = crypto_alloc_skcipher(header_cipher, 0, 0);
+	if (IS_ERR(tfm)) {
+		rc = PTR_ERR(tfm);
+		goto out;
+	}
+
+	rc = crypto_skcipher_setkey(tfm, key,
+				    quic_crypto_key_size(crypto_ctx->conn_info
+							 .cipher_type));
+	if (rc) {
+		crypto_free_skcipher(tfm);
+		goto out;
+	}
+
+	crypto_ctx->header_tfm = tfm;
+
+out:
+	return rc;
+}
+
+static int
+quic_config_packet_crypto(struct quic_internal_crypto_context *crypto_ctx)
+{
+	struct crypto_aead *aead;
+	char *cipher_name;
+	int rc = 0;
+	char *key;
+
+	switch (crypto_ctx->conn_info.cipher_type) {
+	case TLS_CIPHER_AES_GCM_128: {
+		key = crypto_ctx->conn_info.aes_gcm_128.payload_key;
+		cipher_name = "gcm(aes)";
+		break;
+	}
+	case TLS_CIPHER_AES_GCM_256: {
+		key = crypto_ctx->conn_info.aes_gcm_256.payload_key;
+		cipher_name = "gcm(aes)";
+		break;
+	}
+	case TLS_CIPHER_AES_CCM_128: {
+		key = crypto_ctx->conn_info.aes_ccm_128.payload_key;
+		cipher_name = "ccm(aes)";
+		break;
+	}
+	case TLS_CIPHER_CHACHA20_POLY1305: {
+		key = crypto_ctx->conn_info.chacha20_poly1305.payload_key;
+		cipher_name = "rfc7539(chacha20,poly1305)";
+		break;
+	}
+	default:
+		rc = -EINVAL;
+		goto out;
+	}
+
+	aead = crypto_alloc_aead(cipher_name, 0, 0);
+	if (IS_ERR(aead)) {
+		rc = PTR_ERR(aead);
+		goto out;
+	}
+
+	rc = crypto_aead_setkey(aead, key,
+				quic_crypto_key_size(crypto_ctx->conn_info
+						     .cipher_type));
+	if (rc)
+		goto free_aead;
+
+	rc = crypto_aead_setauthsize(aead,
+				     quic_crypto_tag_size(crypto_ctx->conn_info
+							  .cipher_type));
+	if (rc)
+		goto free_aead;
+
+	crypto_ctx->packet_aead = aead;
+	goto out;
+
+free_aead:
+	crypto_free_aead(aead);
+
+out:
+	return rc;
+}
+
+static inline struct quic_context *quic_get_ctx(struct sock *sk)
+{
+	struct inet_sock *inet = inet_sk(sk);
+
+	return (__force void *)rcu_access_pointer(inet->ulp_data);
+}
+
+static void quic_free_cipher_page(struct page *page)
+{
+	__free_pages(page, QUIC_MAX_CIPHER_PAGES_ORDER);
+}
+
+static struct quic_context *quic_ctx_create(void)
+{
+	struct quic_context *ctx;
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return NULL;
+
+	mutex_init(&ctx->sendmsg_mux);
+	ctx->cipher_page = alloc_pages(GFP_KERNEL, QUIC_MAX_CIPHER_PAGES_ORDER);
+	if (!ctx->cipher_page)
+		goto out_err;
+
+	if (rhashtable_init(&ctx->tx_connections,
+			    &quic_tx_connection_params) < 0) {
+		quic_free_cipher_page(ctx->cipher_page);
+		goto out_err;
+	}
+
+	return ctx;
+
+out_err:
+	kfree(ctx);
+	return NULL;
+}
+
+static int quic_getsockopt(struct sock *sk, int level, int optname,
+			   char __user *optval, int __user *optlen)
+{
+	struct quic_context *ctx = quic_get_ctx(sk);
+
+	return ctx->sk_proto->getsockopt(sk, level, optname, optval, optlen);
+}
+
+static int do_quic_conn_add_tx(struct sock *sk, sockptr_t optval,
+			       unsigned int optlen)
+{
+	struct quic_internal_crypto_context *crypto_ctx;
+	struct quic_context *ctx = quic_get_ctx(sk);
+	struct quic_connection_rhash *connhash;
+	int rc = 0;
+
+	if (sockptr_is_null(optval))
+		return -EINVAL;
+
+	if (optlen != sizeof(struct quic_connection_info))
+		return -EINVAL;
+
+	connhash = kzalloc(sizeof(*connhash), GFP_KERNEL);
+	if (!connhash)
+		return -EFAULT;
+
+	crypto_ctx = &connhash->crypto_ctx;
+	rc = copy_from_sockptr(&crypto_ctx->conn_info, optval,
+			       sizeof(crypto_ctx->conn_info));
+	if (rc) {
+		rc = -EFAULT;
+		goto err_crypto_info;
+	}
+
+	// create all TLS materials for packet and header decryption
+	rc = quic_config_header_crypto(crypto_ctx);
+	if (rc)
+		goto err_crypto_info;
+
+	rc = quic_config_packet_crypto(crypto_ctx);
+	if (rc)
+		goto err_free_skcipher;
+
+	// insert crypto data into hash per connection ID
+	rc = rhashtable_insert_fast(&ctx->tx_connections, &connhash->node,
+				    quic_tx_connection_params);
+	if (rc < 0)
+		goto err_free_ciphers;
+
+	return 0;
+
+err_free_ciphers:
+	crypto_free_aead(crypto_ctx->packet_aead);
+
+err_free_skcipher:
+	crypto_free_skcipher(crypto_ctx->header_tfm);
+
+err_crypto_info:
+	// wipeout all crypto materials;
+	memzero_explicit(&connhash->crypto_ctx, sizeof(connhash->crypto_ctx));
+	kfree(connhash);
+	return rc;
+}
+
+static int do_quic_conn_del_tx(struct sock *sk, sockptr_t optval,
+			       unsigned int optlen)
+{
+	struct quic_internal_crypto_context *crypto_ctx;
+	struct quic_context *ctx = quic_get_ctx(sk);
+	struct quic_connection_rhash *connhash;
+	struct quic_connection_info conn_info;
+
+	if (sockptr_is_null(optval))
+		return -EINVAL;
+
+	if (optlen != sizeof(struct quic_connection_info))
+		return -EINVAL;
+
+	if (copy_from_sockptr(&conn_info, optval, optlen))
+		return -EFAULT;
+
+	connhash = rhashtable_lookup_fast(&ctx->tx_connections,
+					  &conn_info.key,
+					  quic_tx_connection_params);
+	if (!connhash)
+		return -EINVAL;
+
+	rhashtable_remove_fast(&ctx->tx_connections,
+			       &connhash->node,
+			       quic_tx_connection_params);
+
+	crypto_ctx = &connhash->crypto_ctx;
+
+	crypto_free_skcipher(crypto_ctx->header_tfm);
+	crypto_free_aead(crypto_ctx->packet_aead);
+	memzero_explicit(crypto_ctx, sizeof(*crypto_ctx));
+	kfree(connhash);
+
+	return 0;
+}
+
+static int do_quic_setsockopt(struct sock *sk, int optname, sockptr_t optval,
+			      unsigned int optlen)
+{
+	int rc = 0;
+
+	switch (optname) {
+	case UDP_QUIC_ADD_TX_CONNECTION:
+		lock_sock(sk);
+		rc = do_quic_conn_add_tx(sk, optval, optlen);
+		release_sock(sk);
+		break;
+	case UDP_QUIC_DEL_TX_CONNECTION:
+		lock_sock(sk);
+		rc = do_quic_conn_del_tx(sk, optval, optlen);
+		release_sock(sk);
+		break;
+	default:
+		rc = -ENOPROTOOPT;
+		break;
+	}
+
+	return rc;
+}
+
+static int quic_setsockopt(struct sock *sk, int level, int optname,
+			   sockptr_t optval, unsigned int optlen)
+{
+	struct quic_context *ctx;
+	struct proto *sk_proto;
+
+	rcu_read_lock();
+	ctx = quic_get_ctx(sk);
+	sk_proto = ctx->sk_proto;
+	rcu_read_unlock();
+
+	if (level == SOL_UDP &&
+	    (optname == UDP_QUIC_ADD_TX_CONNECTION ||
+	     optname == UDP_QUIC_DEL_TX_CONNECTION))
+		return do_quic_setsockopt(sk, optname, optval, optlen);
+
+	return sk_proto->setsockopt(sk, level, optname, optval, optlen);
+}
+
+static int
+quic_extract_ancillary_data(struct msghdr *msg,
+			    struct quic_tx_ancillary_data *ancillary_data,
+			    u16 *udp_pkt_size)
+{
+	struct cmsghdr *cmsg_hdr = NULL;
+	void *ancillary_data_ptr = NULL;
+
+	if (!msg->msg_controllen)
+		return -EINVAL;
+
+	for_each_cmsghdr(cmsg_hdr, msg) {
+		if (!CMSG_OK(msg, cmsg_hdr))
+			return -EINVAL;
+
+		if (cmsg_hdr->cmsg_level != IPPROTO_UDP)
+			continue;
+
+		if (cmsg_hdr->cmsg_type == UDP_QUIC_ENCRYPT) {
+			if (cmsg_hdr->cmsg_len !=
+			    CMSG_LEN(sizeof(struct quic_tx_ancillary_data)))
+				return -EINVAL;
+			memcpy((void *)ancillary_data, CMSG_DATA(cmsg_hdr),
+			       sizeof(struct quic_tx_ancillary_data));
+			ancillary_data_ptr = cmsg_hdr;
+		} else if (cmsg_hdr->cmsg_type == UDP_SEGMENT) {
+			if (cmsg_hdr->cmsg_len != CMSG_LEN(sizeof(u16)))
+				return -EINVAL;
+			memcpy((void *)udp_pkt_size, CMSG_DATA(cmsg_hdr),
+			       sizeof(u16));
+		}
+	}
+
+	if (!ancillary_data_ptr)
+		return -EINVAL;
+
+	return 0;
+}
+
+static int quic_sendmsg_validate(struct msghdr *msg)
+{
+	if (!iter_is_iovec(&msg->msg_iter))
+		return -EINVAL;
+
+	if (!msg->msg_controllen)
+		return -EINVAL;
+
+	return 0;
+}
+
+static struct quic_connection_rhash
+*quic_lookup_connection(struct quic_context *ctx,
+			u8 *conn_id,
+			struct quic_tx_ancillary_data *ancillary_data)
+{
+	struct quic_connection_info_key conn_key;
+
+	// Lookup connection information by the connection key.
+	memset(&conn_key, 0, sizeof(struct quic_connection_info_key));
+	// fill the connection id up to the max connection ID length
+	if (ancillary_data->conn_id_length > QUIC_MAX_CONNECTION_ID_SIZE)
+		return NULL;
+
+	conn_key.conn_id_length = ancillary_data->conn_id_length;
+	if (ancillary_data->conn_id_length)
+		memcpy(conn_key.conn_id,
+		       conn_id,
+		       ancillary_data->conn_id_length);
+	return rhashtable_lookup_fast(&ctx->tx_connections,
+				      &conn_key,
+				      quic_tx_connection_params);
+}
+
+static int quic_sg_capacity_from_msg(const size_t pkt_size,
+				     const off_t offset,
+				     const size_t length)
+{
+	size_t	pages = 0;
+	size_t	pkts = 0;
+
+	pages = DIV_ROUND_UP(offset + length, PAGE_SIZE);
+	pkts = DIV_ROUND_UP(length, pkt_size);
+	return pages + pkts + 1;
+}
+
+static void quic_put_plain_user_pages(struct page **pages, size_t nr_pages)
+{
+	int i;
+
+	for (i = 0; i < nr_pages; ++i)
+		if (i == 0 || pages[i] != pages[i - 1])
+			put_page(pages[i]);
+}
+
+static int quic_get_plain_user_pages(struct msghdr * const msg,
+				     struct page **pages,
+				     int *page_indices)
+{
+	size_t	nr_mapped = 0;
+	size_t	nr_pages = 0;
+	void __user	*data_addr;
+	void	*page_addr;
+	size_t	count = 0;
+	off_t	data_off;
+	int	ret = 0;
+	int	i;
+
+	for (i = 0; i < msg->msg_iter.nr_segs; ++i) {
+		data_addr = msg->msg_iter.iov[i].iov_base;
+		if (!i)
+			data_addr += msg->msg_iter.iov_offset;
+		page_addr =
+			(void *)((unsigned long)data_addr & PAGE_MASK);
+
+		data_off = (unsigned long)data_addr & ~PAGE_MASK;
+		nr_pages =
+			DIV_ROUND_UP(data_off + msg->msg_iter.iov[i].iov_len,
+				     PAGE_SIZE);
+		if (nr_mapped + nr_pages > QUIC_MAX_PLAIN_PAGES) {
+			quic_put_plain_user_pages(pages, nr_mapped);
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		count = get_user_pages((unsigned long)page_addr, nr_pages, 1,
+				       pages, NULL);
+		if (count < nr_pages) {
+			quic_put_plain_user_pages(pages, nr_mapped + count);
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		page_indices[i] = nr_mapped;
+		nr_mapped += count;
+		pages += count;
+	}
+	ret = nr_mapped;
+
+out:
+	return ret;
+}
+
+static int quic_sg_plain_from_mapped_msg(struct msghdr * const msg,
+					 struct page **plain_pages,
+					 void **iov_base_ptrs,
+					 void **iov_data_ptrs,
+					 const size_t plain_size,
+					 const size_t pkt_size,
+					 struct scatterlist * const sg_alloc,
+					 const size_t max_sg_alloc,
+					 struct scatterlist ** const sg_pkts,
+					 size_t *nr_plain_pages)
+{
+	int iov_page_indices[QUIC_MAX_IOVEC_SEGMENTS];
+	struct scatterlist *sg;
+	unsigned int pkt_i = 0;
+	ssize_t left_on_page;
+	size_t pkt_left;
+	unsigned int i;
+	size_t seg_len;
+	off_t page_ofs;
+	off_t seg_ofs;
+	int ret = 0;
+	int page_i;
+
+	if (msg->msg_iter.nr_segs >= QUIC_MAX_IOVEC_SEGMENTS) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ret = quic_get_plain_user_pages(msg, plain_pages, iov_page_indices);
+	if (ret < 0)
+		goto out;
+
+	*nr_plain_pages = ret;
+	sg = sg_alloc;
+	sg_pkts[pkt_i] = sg;
+	sg_unmark_end(sg);
+	pkt_left = pkt_size;
+	for (i = 0; i < msg->msg_iter.nr_segs; ++i) {
+		page_ofs = ((unsigned long)msg->msg_iter.iov[i].iov_base
+			   & (PAGE_SIZE - 1));
+		page_i = 0;
+		if (!i) {
+			page_ofs += msg->msg_iter.iov_offset;
+			while (page_ofs >= PAGE_SIZE) {
+				page_ofs -= PAGE_SIZE;
+				page_i++;
+			}
+		}
+
+		seg_len = msg->msg_iter.iov[i].iov_len;
+		page_i += iov_page_indices[i];
+
+		if (page_i >= QUIC_MAX_PLAIN_PAGES)
+			return -EFAULT;
+
+		seg_ofs = 0;
+		while (seg_ofs < seg_len) {
+			if (sg - sg_alloc > max_sg_alloc)
+				return -EFAULT;
+
+			sg_unmark_end(sg);
+			left_on_page = min_t(size_t, PAGE_SIZE - page_ofs,
+					     seg_len - seg_ofs);
+			if (left_on_page <= 0)
+				return -EFAULT;
+
+			if (left_on_page > pkt_left) {
+				sg_set_page(sg, plain_pages[page_i], pkt_left,
+					    page_ofs);
+				pkt_i++;
+				seg_ofs += pkt_left;
+				page_ofs += pkt_left;
+				sg_mark_end(sg);
+				sg++;
+				sg_pkts[pkt_i] = sg;
+				pkt_left = pkt_size;
+				continue;
+			}
+			sg_set_page(sg, plain_pages[page_i], left_on_page,
+				    page_ofs);
+			page_i++;
+			page_ofs = 0;
+			seg_ofs += left_on_page;
+			pkt_left -= left_on_page;
+			if (pkt_left == 0 ||
+			    (seg_ofs == seg_len &&
+			     i == msg->msg_iter.nr_segs - 1)) {
+				sg_mark_end(sg);
+				pkt_i++;
+				sg++;
+				sg_pkts[pkt_i] = sg;
+				pkt_left = pkt_size;
+			} else {
+				sg++;
+			}
+		}
+	}
+
+	if (pkt_left && pkt_left != pkt_size) {
+		pkt_i++;
+		sg_mark_end(sg);
+	}
+	ret = pkt_i;
+
+out:
+	return ret;
+}
+
+/* sg_alloc: allocated zeroed array of scatterlists
+ * cipher_page: preallocated compound page
+ */
+static int quic_sg_cipher_from_pkts(const size_t cipher_tag_size,
+				    const size_t plain_pkt_size,
+				    const size_t plain_size,
+				    struct page * const cipher_page,
+				    struct scatterlist * const sg_alloc,
+				    const size_t nr_sg_alloc,
+				    struct scatterlist ** const sg_cipher)
+{
+	const size_t cipher_pkt_size = plain_pkt_size + cipher_tag_size;
+	size_t pkts = DIV_ROUND_UP(plain_size, plain_pkt_size);
+	struct scatterlist *sg = sg_alloc;
+	int pkt_i;
+	void *ptr;
+
+	if (pkts > nr_sg_alloc)
+		return -EINVAL;
+
+	ptr = page_address(cipher_page);
+	for (pkt_i = 0; pkt_i < pkts;
+		++pkt_i, ptr += cipher_pkt_size, ++sg) {
+		sg_set_buf(sg, ptr, cipher_pkt_size);
+		sg_mark_end(sg);
+		sg_cipher[pkt_i] = sg;
+	}
+	return pkts;
+}
+
+/* fast copy from scatterlist to a buffer assuming that all pages are
+ * available in kernel memory.
+ */
+static int quic_sg_pcopy_to_buffer_kernel(struct scatterlist *sg,
+					  u8 *buffer,
+					  size_t bytes_to_copy,
+					  off_t offset_to_read)
+{
+	off_t sg_remain = sg->length;
+	size_t to_copy;
+
+	if (!bytes_to_copy)
+		return 0;
+
+	// skip to offset first
+	while (offset_to_read > 0) {
+		if (!sg_remain)
+			return -EINVAL;
+		if (offset_to_read < sg_remain) {
+			sg_remain -= offset_to_read;
+			break;
+		}
+		offset_to_read -= sg_remain;
+		sg = sg_next(sg);
+		if (!sg)
+			return -EINVAL;
+		sg_remain = sg->length;
+	}
+
+	// traverse sg list from offset to offset + bytes_to_copy
+	while (bytes_to_copy) {
+		to_copy = min_t(size_t, bytes_to_copy, sg_remain);
+		if (!to_copy)
+			return -EINVAL;
+		memcpy(buffer, sg_virt(sg) + (sg->length - sg_remain), to_copy);
+		buffer += to_copy;
+		bytes_to_copy -= to_copy;
+		if (bytes_to_copy) {
+			sg = sg_next(sg);
+			if (!sg)
+				return -EINVAL;
+			sg_remain = sg->length;
+		}
+	}
+
+	return 0;
+}
+
+static int quic_copy_header(struct scatterlist *sg_plain,
+			    u8 *buf, const size_t buf_len,
+			    const size_t conn_id_len)
+{
+	u8 *pkt = sg_virt(sg_plain);
+	size_t hdr_len;
+
+	hdr_len = 1 + conn_id_len + ((*pkt & 0x03) + 1);
+	if (hdr_len > QUIC_MAX_SHORT_HEADER_SIZE || hdr_len > buf_len)
+		return -EINVAL;
+
+	WARN_ON_ONCE(quic_sg_pcopy_to_buffer_kernel(sg_plain, buf, hdr_len, 0));
+	return hdr_len;
+}
+
+static u64 quic_unpack_pkt_num(struct quic_tx_ancillary_data * const control,
+			       const u8 * const hdr,
+			       const off_t payload_crypto_off)
+{
+	u64 truncated_pn = 0;
+	u64 candidate_pn;
+	u64 expected_pn;
+	u64 pn_hwin;
+	u64 pn_mask;
+	u64 pn_len;
+	u64 pn_win;
+	int i;
+
+	pn_len = (hdr[0] & 0x03) + 1;
+	expected_pn = control->next_pkt_num;
+
+	for (i = 1 + control->conn_id_length; i < payload_crypto_off; ++i) {
+		truncated_pn <<= 8;
+		truncated_pn |= hdr[i];
+	}
+
+	pn_win = 1ULL << (pn_len << 3);
+	pn_hwin = pn_win >> 1;
+	pn_mask = pn_win - 1;
+	candidate_pn = (expected_pn & ~pn_mask) | truncated_pn;
+
+	if (expected_pn > pn_hwin &&
+	    candidate_pn <= expected_pn - pn_hwin &&
+	    candidate_pn < (1ULL << 62) - pn_win)
+		return candidate_pn + pn_win;
+
+	if (candidate_pn > expected_pn + pn_hwin &&
+	    candidate_pn >= pn_win)
+		return candidate_pn - pn_win;
+
+	return candidate_pn;
+}
+
+static int
+quic_construct_header_prot_mask(struct quic_internal_crypto_context *crypto_ctx,
+				struct skcipher_request *hdr_mask_req,
+				struct scatterlist *sg_cipher_pkt,
+				off_t sample_offset,
+				u8 *hdr_mask)
+{
+	u8 *sample = sg_virt(sg_cipher_pkt) + sample_offset;
+	u8 hdr_ctr[sizeof(u32) + QUIC_CIPHER_MAX_IV_SIZE];
+	u8 chacha20_zeros[5] = {0, 0, 0, 0, 0};
+	struct scatterlist sg_cipher_sample;
+	struct scatterlist sg_hdr_mask;
+	struct crypto_wait wait_header;
+	__le32	counter;
+
+	BUILD_BUG_ON(QUIC_HDR_MASK_SIZE
+		     < sizeof(u32) + QUIC_CIPHER_MAX_IV_SIZE);
+
+	sg_init_one(&sg_hdr_mask, hdr_mask, QUIC_HDR_MASK_SIZE);
+	skcipher_request_set_callback(hdr_mask_req, 0, crypto_req_done,
+				      &wait_header);
+
+	if (crypto_ctx->conn_info.cipher_type == TLS_CIPHER_CHACHA20_POLY1305) {
+		sg_init_one(&sg_cipher_sample, (u8 *)chacha20_zeros,
+			    sizeof(chacha20_zeros));
+		counter = cpu_to_le32(*((u32 *)sample));
+		memset(hdr_ctr, 0, sizeof(hdr_ctr));
+		memcpy((u8 *)hdr_ctr, (u8 *)&counter, sizeof(u32));
+		memcpy((u8 *)hdr_ctr + sizeof(u32),
+		       (sample + sizeof(u32)),
+		       QUIC_CIPHER_MAX_IV_SIZE);
+		skcipher_request_set_crypt(hdr_mask_req, &sg_cipher_sample,
+					   &sg_hdr_mask, 5, hdr_ctr);
+	} else {
+		// cipher pages are continuous, get the pointer to the sg data
+		// directly, pages are allocated in kernel
+		sg_init_one(&sg_cipher_sample, sample, QUIC_HDR_MASK_SIZE);
+		skcipher_request_set_crypt(hdr_mask_req, &sg_cipher_sample,
+					   &sg_hdr_mask, QUIC_HDR_MASK_SIZE,
+					   NULL);
+	}
+
+	return crypto_wait_req(crypto_skcipher_encrypt(hdr_mask_req),
+			       &wait_header);
+}
+
+static int quic_protect_header(struct quic_internal_crypto_context *crypto_ctx,
+			       struct quic_tx_ancillary_data *control,
+			       struct skcipher_request *hdr_mask_req,
+			       struct scatterlist *sg_cipher_pkt,
+			       int payload_crypto_off)
+{
+	u8 hdr_mask[QUIC_HDR_MASK_SIZE];
+	off_t quic_pkt_num_off;
+	u8 quic_pkt_num_len;
+	u8 *cipher_hdr;
+	int err;
+	int i;
+
+	quic_pkt_num_off = 1 + control->conn_id_length;
+	quic_pkt_num_len = payload_crypto_off - quic_pkt_num_off;
+
+	if (quic_pkt_num_len > 4)
+		return -EPERM;
+
+	err = quic_construct_header_prot_mask(crypto_ctx, hdr_mask_req,
+					      sg_cipher_pkt,
+					      payload_crypto_off +
+					      (4 - quic_pkt_num_len),
+					      hdr_mask);
+	if (unlikely(err))
+		return err;
+
+	cipher_hdr = sg_virt(sg_cipher_pkt);
+	// protect the public flags
+	cipher_hdr[0] ^= (hdr_mask[0] & 0x1f);
+
+	for (i = 0; i < quic_pkt_num_len; ++i)
+		cipher_hdr[quic_pkt_num_off + i] ^= hdr_mask[1 + i];
+
+	return 0;
+}
+
+static
+void quic_construct_ietf_nonce(u8 *nonce,
+			       struct quic_internal_crypto_context *crypto_ctx,
+			       u64 quic_pkt_num)
+{
+	u8 *iv = quic_payload_iv(crypto_ctx);
+	int i;
+
+	for (i = quic_crypto_nonce_size(crypto_ctx->conn_info.cipher_type) - 1;
+	     i >= 0 && quic_pkt_num;
+	     --i, quic_pkt_num >>= 8)
+		nonce[i] = iv[i] ^ (u8)quic_pkt_num;
+
+	for (; i >= 0; --i)
+		nonce[i] = iv[i];
+}
+
+static ssize_t quic_sendpage(struct quic_context *ctx,
+			     struct sock *sk,
+			     struct msghdr *msg,
+			     const size_t cipher_size,
+			     struct page * const cipher_page)
+{
+	struct kvec iov;
+	ssize_t ret;
+
+	iov.iov_base = page_address(cipher_page);
+	iov.iov_len = cipher_size;
+	iov_iter_kvec(&msg->msg_iter, WRITE, &iov, 1, cipher_size);
+	ret = security_socket_sendmsg(sk->sk_socket, msg, msg_data_left(msg));
+	if (ret)
+		return ret;
+
+	ret = ctx->sk_proto->sendmsg(sk, msg, msg_data_left(msg));
+	WARN_ON(ret == -EIOCBQUEUED);
+	return ret;
+}
+
+static int quic_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
+{
+	struct quic_internal_crypto_context *crypto_ctx = NULL;
+	struct scatterlist *sg_cipher_pkts[QUIC_MAX_GSO_FRAGS];
+	struct scatterlist *sg_plain_pkts[QUIC_MAX_GSO_FRAGS];
+	struct page *plain_pages[QUIC_MAX_PLAIN_PAGES];
+	void *plain_base_ptrs[QUIC_MAX_IOVEC_SEGMENTS];
+	void *plain_data_ptrs[QUIC_MAX_IOVEC_SEGMENTS];
+	struct msghdr msg_cipher = {
+		.msg_name = msg->msg_name,
+		.msg_namelen = msg->msg_namelen,
+		.msg_flags = msg->msg_flags,
+		.msg_control = msg->msg_control,
+		.msg_controllen = msg->msg_controllen,
+	};
+	struct quic_connection_rhash *connhash = NULL;
+	struct quic_context *ctx = quic_get_ctx(sk);
+	u8 hdr_buf[QUIC_MAX_SHORT_HEADER_SIZE];
+	struct skcipher_request *hdr_mask_req;
+	struct quic_tx_ancillary_data control;
+	u8 nonce[QUIC_CIPHER_MAX_NONCE_SIZE];
+	struct	aead_request *aead_req = NULL;
+	struct scatterlist *sg_cipher = NULL;
+	struct udp_sock *up = udp_sk(sk);
+	struct scatterlist *sg_plain = NULL;
+	u16 gso_pkt_size = up->gso_size;
+	size_t last_plain_pkt_size = 0;
+	off_t	payload_crypto_offset;
+	struct crypto_aead *tfm = NULL;
+	size_t nr_plain_pages = 0;
+	struct crypto_wait waiter;
+	size_t nr_sg_cipher_pkts;
+	size_t nr_sg_plain_pkts;
+	ssize_t hdr_buf_len = 0;
+	size_t nr_sg_alloc = 0;
+	size_t plain_pkt_size;
+	u64	full_pkt_num;
+	size_t cipher_size;
+	size_t plain_size;
+	size_t pkt_size;
+	size_t tag_size;
+	int ret = 0;
+	int pkt_i;
+	int err;
+
+	memset(&hdr_buf[0], 0, QUIC_MAX_SHORT_HEADER_SIZE);
+	hdr_buf_len = copy_from_iter(hdr_buf, QUIC_MAX_SHORT_HEADER_SIZE,
+				     &msg->msg_iter);
+	if (hdr_buf_len <= 0) {
+		ret = -EINVAL;
+		goto out;
+	}
+	iov_iter_revert(&msg->msg_iter, hdr_buf_len);
+
+	ctx = quic_get_ctx(sk);
+
+	// Bypass for anything that is guaranteed not QUIC.
+	plain_size = len;
+
+	if (plain_size < 2)
+		return ctx->sk_proto->sendmsg(sk, msg, len);
+
+	// Bypass for other than short header.
+	if ((hdr_buf[0] & 0xc0) != 0x40)
+		return ctx->sk_proto->sendmsg(sk, msg, len);
+
+	// Crypto adds a tag after the packet. Corking a payload would produce
+	// a crypto tag after each portion. Use GSO instead.
+	if ((msg->msg_flags & MSG_MORE) || up->pending) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = quic_sendmsg_validate(msg);
+	if (ret)
+		goto out;
+
+	ret = quic_extract_ancillary_data(msg, &control, &gso_pkt_size);
+	if (ret)
+		goto out;
+
+	// Reserved bits with ancillary data present are an error.
+	if (control.flags & ~QUIC_ANCILLARY_FLAGS) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	// Bypass offload on request. First packet bypass applies to all
+	// packets in the GSO pack.
+	if (control.flags & QUIC_BYPASS_ENCRYPTION)
+		return ctx->sk_proto->sendmsg(sk, msg, len);
+
+	if (hdr_buf_len < 1 + control.conn_id_length) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	// Fetch the flow
+	connhash = quic_lookup_connection(ctx, &hdr_buf[1], &control);
+	if (!connhash) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	crypto_ctx = &connhash->crypto_ctx;
+
+	tag_size = quic_crypto_tag_size(crypto_ctx->conn_info.cipher_type);
+
+	// For GSO, use the GSO size minus cipher tag size as the packet size;
+	// for non-GSO, use the size of the whole plaintext.
+	// Reduce the packet size by tag size to keep the original packet size
+	// for the rest of the UDP path in the stack.
+	if (!gso_pkt_size) {
+		plain_pkt_size = plain_size;
+	} else {
+		if (gso_pkt_size < tag_size)
+			goto out;
+
+		plain_pkt_size = gso_pkt_size - tag_size;
+	}
+
+	// Build scatterlist from the input data, split by GSO minus the
+	// crypto tag size.
+	nr_sg_alloc = quic_sg_capacity_from_msg(plain_pkt_size,
+						msg->msg_iter.iov_offset,
+						plain_size);
+	if ((nr_sg_alloc * 2) >= QUIC_MAX_SG_ALLOC_ELEMENTS) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	sg_plain = ctx->sg_alloc;
+	sg_cipher = sg_plain + nr_sg_alloc;
+
+	ret = quic_sg_plain_from_mapped_msg(msg, plain_pages,
+					    plain_base_ptrs,
+					    plain_data_ptrs, plain_size,
+					    plain_pkt_size, sg_plain,
+					    nr_sg_alloc, sg_plain_pkts,
+					    &nr_plain_pages);
+
+	if (ret < 0)
+		goto out;
+
+	nr_sg_plain_pkts = ret;
+	last_plain_pkt_size = plain_size % plain_pkt_size;
+	if (!last_plain_pkt_size)
+		last_plain_pkt_size = plain_pkt_size;
+
+	// Build scatterlist for the ciphertext, split by GSO.
+	cipher_size = plain_size + nr_sg_plain_pkts * tag_size;
+
+	if (DIV_ROUND_UP(cipher_size, PAGE_SIZE)
+	    >= (1 << QUIC_MAX_CIPHER_PAGES_ORDER)) {
+		ret = -ENOMEM;
+		goto out_put_pages;
+	}
+
+	ret = quic_sg_cipher_from_pkts(tag_size, plain_pkt_size, plain_size,
+				       ctx->cipher_page, sg_cipher, nr_sg_alloc,
+				       sg_cipher_pkts);
+	if (ret < 0)
+		goto out_put_pages;
+
+	nr_sg_cipher_pkts = ret;
+
+	if (nr_sg_plain_pkts != nr_sg_cipher_pkts) {
+		ret = -EPERM;
+		goto out_put_pages;
+	}
+
+	// Encrypt and protect header for each packet individually.
+	tfm = crypto_ctx->packet_aead;
+	crypto_aead_clear_flags(tfm, ~0);
+	aead_req = aead_request_alloc(tfm, GFP_KERNEL);
+	if (!aead_req) {
+		aead_request_free(aead_req);
+		ret = -ENOMEM;
+		goto out_put_pages;
+	}
+
+	hdr_mask_req = skcipher_request_alloc(crypto_ctx->header_tfm,
+					      GFP_KERNEL);
+	if (!hdr_mask_req) {
+		aead_request_free(aead_req);
+		ret = -ENOMEM;
+		goto out_put_pages;
+	}
+
+	for (pkt_i = 0; pkt_i < nr_sg_plain_pkts; ++pkt_i) {
+		payload_crypto_offset =
+			quic_copy_header(sg_plain_pkts[pkt_i],
+					 hdr_buf,
+					 sizeof(hdr_buf),
+					 control.conn_id_length);
+
+		full_pkt_num = quic_unpack_pkt_num(&control, hdr_buf,
+						   payload_crypto_offset);
+
+		pkt_size = (pkt_i + 1 < nr_sg_plain_pkts
+				? plain_pkt_size
+				: last_plain_pkt_size)
+			    - payload_crypto_offset;
+		if (pkt_size < 0) {
+			aead_request_free(aead_req);
+			skcipher_request_free(hdr_mask_req);
+			ret = -EINVAL;
+			goto out_put_pages;
+		}
+
+		/* Construct nonce and initialize request */
+		quic_construct_ietf_nonce(nonce, crypto_ctx, full_pkt_num);
+
+		/* Encrypt the body */
+		aead_request_set_callback(aead_req,
+					  CRYPTO_TFM_REQ_MAY_BACKLOG
+					  | CRYPTO_TFM_REQ_MAY_SLEEP,
+					  crypto_req_done, &waiter);
+		aead_request_set_crypt(aead_req, sg_plain_pkts[pkt_i],
+				       sg_cipher_pkts[pkt_i],
+				       pkt_size,
+				       nonce);
+		aead_request_set_ad(aead_req, payload_crypto_offset);
+		err = crypto_wait_req(crypto_aead_encrypt(aead_req), &waiter);
+		if (unlikely(err)) {
+			ret = err;
+			aead_request_free(aead_req);
+			skcipher_request_free(hdr_mask_req);
+			goto out_put_pages;
+		}
+
+		/* Protect the header */
+		memcpy(sg_virt(sg_cipher_pkts[pkt_i]), hdr_buf,
+		       payload_crypto_offset);
+
+		err = quic_protect_header(crypto_ctx, &control,
+					  hdr_mask_req,
+					  sg_cipher_pkts[pkt_i],
+					  payload_crypto_offset);
+		if (unlikely(err)) {
+			ret = err;
+			aead_request_free(aead_req);
+			skcipher_request_free(hdr_mask_req);
+			goto out_put_pages;
+		}
+	}
+	skcipher_request_free(hdr_mask_req);
+	aead_request_free(aead_req);
+
+	// Deliver to the next layer.
+	if (ctx->sk_proto->sendpage) {
+		msg_cipher.msg_flags |= MSG_MORE;
+		err = ctx->sk_proto->sendmsg(sk, &msg_cipher, 0);
+		if (err < 0) {
+			ret = err;
+			goto out_put_pages;
+		}
+
+		err = ctx->sk_proto->sendpage(sk, ctx->cipher_page, 0,
+					      cipher_size, 0);
+		if (err < 0) {
+			ret = err;
+			goto out_put_pages;
+		}
+		if (err != cipher_size) {
+			ret = -EINVAL;
+			goto out_put_pages;
+		}
+		ret = plain_size;
+	} else {
+		ret = quic_sendpage(ctx, sk, &msg_cipher, cipher_size,
+				    ctx->cipher_page);
+		// indicate full plaintext transmission to the caller.
+		if (ret > 0)
+			ret = plain_size;
+	}
+
+out_put_pages:
+	quic_put_plain_user_pages(plain_pages, nr_plain_pages);
+
+out:
+	return ret;
+}
+
+static int quic_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t len)
+{
+	struct quic_context *ctx;
+	int ret;
+
+	rcu_read_lock();
+	ctx = quic_get_ctx(sk);
+	rcu_read_unlock();
+	if (!ctx)
+		return -EINVAL;
+
+	mutex_lock(&ctx->sendmsg_mux);
+	ret = quic_sendmsg(sk, msg, len);
+	mutex_unlock(&ctx->sendmsg_mux);
+	return ret;
+}
+
+static void quic_release_resources(struct sock *sk)
+{
+	struct quic_internal_crypto_context *crypto_ctx;
+	struct quic_connection_rhash *connhash;
+	struct inet_sock *inet = inet_sk(sk);
+	struct rhashtable_iter hti;
+	struct quic_context *ctx;
+	struct proto *sk_proto;
+
+	rcu_read_lock();
+	ctx = quic_get_ctx(sk);
+	if (!ctx) {
+		rcu_read_unlock();
+		return;
+	}
+
+	sk_proto = ctx->sk_proto;
+
+	rhashtable_walk_enter(&ctx->tx_connections, &hti);
+	rhashtable_walk_start(&hti);
+
+	while ((connhash = rhashtable_walk_next(&hti))) {
+		if (IS_ERR(connhash)) {
+			if (PTR_ERR(connhash) == -EAGAIN)
+				continue;
+			break;
+		}
+
+		crypto_ctx = &connhash->crypto_ctx;
+		crypto_free_aead(crypto_ctx->packet_aead);
+		crypto_free_skcipher(crypto_ctx->header_tfm);
+		memzero_explicit(crypto_ctx, sizeof(*crypto_ctx));
+	}
+
+	rhashtable_walk_stop(&hti);
+	rhashtable_walk_exit(&hti);
+	rhashtable_destroy(&ctx->tx_connections);
+
+	if (ctx->cipher_page) {
+		quic_free_cipher_page(ctx->cipher_page);
+		ctx->cipher_page = NULL;
+	}
+
+	rcu_read_unlock();
+
+	write_lock_bh(&sk->sk_callback_lock);
+	rcu_assign_pointer(inet->ulp_data, NULL);
+	WRITE_ONCE(sk->sk_prot, sk_proto);
+	write_unlock_bh(&sk->sk_callback_lock);
+
+	kfree_rcu(ctx, rcu);
+}
+
+static void
+quic_prep_protos(unsigned int af, struct proto *proto, const struct proto *base)
+{
+	if (likely(test_bit(af, &af_init_done)))
+		return;
+
+	spin_lock(&quic_proto_lock);
+	if (test_bit(af, &af_init_done))
+		goto out_unlock;
+
+	*proto			= *base;
+	proto->setsockopt	= quic_setsockopt;
+	proto->getsockopt	= quic_getsockopt;
+	proto->sendmsg		= quic_sendmsg_locked;
+
+	smp_mb__before_atomic(); /* proto calls should be visible first */
+	set_bit(af, &af_init_done);
+
+out_unlock:
+	spin_unlock(&quic_proto_lock);
+}
+
+static void quic_update_proto(struct sock *sk, struct quic_context *ctx)
+{
+	struct proto *udp_proto, *quic_proto;
+	struct inet_sock *inet = inet_sk(sk);
+
+	udp_proto = READ_ONCE(sk->sk_prot);
+	ctx->sk_proto = udp_proto;
+	quic_proto = sk->sk_family == AF_INET ? &quic_v4_proto : &quic_v6_proto;
+
+	quic_prep_protos(sk->sk_family, quic_proto, udp_proto);
+
+	write_lock_bh(&sk->sk_callback_lock);
+	rcu_assign_pointer(inet->ulp_data, ctx);
+	WRITE_ONCE(sk->sk_prot, quic_proto);
+	write_unlock_bh(&sk->sk_callback_lock);
+}
+
+static int quic_init(struct sock *sk)
+{
+	struct quic_context *ctx;
+
+	ctx = quic_ctx_create();
+	if (!ctx)
+		return -ENOMEM;
+
+	quic_update_proto(sk, ctx);
+
+	return 0;
+}
+
+static void quic_release(struct sock *sk)
+{
+	lock_sock(sk);
+	quic_release_resources(sk);
+	release_sock(sk);
+}
+
+static struct udp_ulp_ops quic_ulp_ops __read_mostly = {
+	.name		= "quic-crypto",
+	.owner		= THIS_MODULE,
+	.init		= quic_init,
+	.release	= quic_release,
+};
+
+static int __init quic_register(void)
+{
+	udp_register_ulp(&quic_ulp_ops);
+	return 0;
+}
+
+static void __exit quic_unregister(void)
+{
+	udp_unregister_ulp(&quic_ulp_ops);
+}
+
+module_init(quic_register);
+module_exit(quic_unregister);
+
+MODULE_DESCRIPTION("QUIC crypto ULP");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_UDP_ULP("quic-crypto");
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [net-next 5/6] Add flow counters and Tx processing error counter
  2022-08-16 18:11 ` [net-next 0/6] net: support QUIC crypto Adel Abouchaev
                     ` (3 preceding siblings ...)
  2022-08-16 18:11   ` [net-next 4/6] Implement QUIC offload functions Adel Abouchaev
@ 2022-08-16 18:11   ` Adel Abouchaev
  2022-08-16 18:11   ` [net-next 6/6] Add self tests for ULP operations, flow setup and crypto tests Adel Abouchaev
  2022-08-17  8:09   ` [net-next 0/6] net: support QUIC crypto Bagas Sanjaya
  6 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-16 18:11 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

Added flow counters. Total flow counter is accumulative, the current shows
the number of flows currently in flight, the error counters is accumulating
the number of errors during Tx processing.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>

---

Updated enum bracket to follow enum keyword. Removed extra blank lines.
---
 include/net/netns/mib.h   |  3 +++
 include/net/quic.h        | 10 +++++++++
 include/net/snmp.h        |  6 +++++
 include/uapi/linux/snmp.h |  9 ++++++++
 net/quic/Makefile         |  2 +-
 net/quic/quic_main.c      | 46 +++++++++++++++++++++++++++++++++++++++
 net/quic/quic_proc.c      | 45 ++++++++++++++++++++++++++++++++++++++
 7 files changed, 120 insertions(+), 1 deletion(-)
 create mode 100644 net/quic/quic_proc.c

diff --git a/include/net/netns/mib.h b/include/net/netns/mib.h
index 7e373664b1e7..dcbba3d1ceec 100644
--- a/include/net/netns/mib.h
+++ b/include/net/netns/mib.h
@@ -24,6 +24,9 @@ struct netns_mib {
 #if IS_ENABLED(CONFIG_TLS)
 	DEFINE_SNMP_STAT(struct linux_tls_mib, tls_statistics);
 #endif
+#if IS_ENABLED(CONFIG_QUIC)
+	DEFINE_SNMP_STAT(struct linux_quic_mib, quic_statistics);
+#endif
 #ifdef CONFIG_MPTCP
 	DEFINE_SNMP_STAT(struct mptcp_mib, mptcp_statistics);
 #endif
diff --git a/include/net/quic.h b/include/net/quic.h
index cafe01174e60..6362d827d266 100644
--- a/include/net/quic.h
+++ b/include/net/quic.h
@@ -25,6 +25,16 @@
 #define QUIC_MAX_PLAIN_PAGES		16
 #define QUIC_MAX_CIPHER_PAGES_ORDER	4
 
+#define __QUIC_INC_STATS(net, field)				\
+	__SNMP_INC_STATS((net)->mib.quic_statistics, field)
+#define QUIC_INC_STATS(net, field)				\
+	SNMP_INC_STATS((net)->mib.quic_statistics, field)
+#define QUIC_DEC_STATS(net, field)				\
+	SNMP_DEC_STATS((net)->mib.quic_statistics, field)
+
+int __net_init quic_proc_init(struct net *net);
+void __net_exit quic_proc_fini(struct net *net);
+
 struct quic_internal_crypto_context {
 	struct quic_connection_info	conn_info;
 	struct crypto_skcipher		*header_tfm;
diff --git a/include/net/snmp.h b/include/net/snmp.h
index 468a67836e2f..f94680a3e9e8 100644
--- a/include/net/snmp.h
+++ b/include/net/snmp.h
@@ -117,6 +117,12 @@ struct linux_tls_mib {
 	unsigned long	mibs[LINUX_MIB_TLSMAX];
 };
 
+/* Linux QUIC */
+#define LINUX_MIB_QUICMAX	__LINUX_MIB_QUICMAX
+struct linux_quic_mib {
+	unsigned long	mibs[LINUX_MIB_QUICMAX];
+};
+
 #define DEFINE_SNMP_STAT(type, name)	\
 	__typeof__(type) __percpu *name
 #define DEFINE_SNMP_STAT_ATOMIC(type, name)	\
diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index 4d7470036a8b..ca1e626dbdb4 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -349,4 +349,13 @@ enum
 	__LINUX_MIB_TLSMAX
 };
 
+/* linux QUIC mib definitions */
+enum {
+	LINUX_MIB_QUICNUM = 0,
+	LINUX_MIB_QUICCURRTXSW,			/* QuicCurrTxSw */
+	LINUX_MIB_QUICTXSW,			/* QuicTxSw */
+	LINUX_MIB_QUICTXSWERROR,		/* QuicTxSwError */
+	__LINUX_MIB_QUICMAX
+};
+
 #endif	/* _LINUX_SNMP_H */
diff --git a/net/quic/Makefile b/net/quic/Makefile
index 928239c4d08c..a885cd8bc4e0 100644
--- a/net/quic/Makefile
+++ b/net/quic/Makefile
@@ -5,4 +5,4 @@
 
 obj-$(CONFIG_QUIC) += quic.o
 
-quic-y := quic_main.o
+quic-y := quic_main.o quic_proc.o
diff --git a/net/quic/quic_main.c b/net/quic/quic_main.c
index 95de3a961479..4f2484fe43ed 100644
--- a/net/quic/quic_main.c
+++ b/net/quic/quic_main.c
@@ -335,6 +335,8 @@ static int do_quic_conn_add_tx(struct sock *sk, sockptr_t optval,
 	if (rc < 0)
 		goto err_free_ciphers;
 
+	QUIC_INC_STATS(sock_net(sk), LINUX_MIB_QUICCURRTXSW);
+	QUIC_INC_STATS(sock_net(sk), LINUX_MIB_QUICTXSW);
 	return 0;
 
 err_free_ciphers:
@@ -383,6 +385,7 @@ static int do_quic_conn_del_tx(struct sock *sk, sockptr_t optval,
 	crypto_free_aead(crypto_ctx->packet_aead);
 	memzero_explicit(crypto_ctx, sizeof(*crypto_ctx));
 	kfree(connhash);
+	QUIC_DEC_STATS(sock_net(sk), LINUX_MIB_QUICCURRTXSW);
 
 	return 0;
 }
@@ -408,6 +411,9 @@ static int do_quic_setsockopt(struct sock *sk, int optname, sockptr_t optval,
 		break;
 	}
 
+	if (rc)
+		QUIC_INC_STATS(sock_net(sk), LINUX_MIB_QUICTXSWERROR);
+
 	return rc;
 }
 
@@ -1213,6 +1219,9 @@ static int quic_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	quic_put_plain_user_pages(plain_pages, nr_plain_pages);
 
 out:
+	if (unlikely(ret < 0))
+		QUIC_INC_STATS(sock_net(sk), LINUX_MIB_QUICTXSWERROR);
+
 	return ret;
 }
 
@@ -1345,6 +1354,36 @@ static void quic_release(struct sock *sk)
 	release_sock(sk);
 }
 
+static int __net_init quic_init_net(struct net *net)
+{
+	int err;
+
+	net->mib.quic_statistics = alloc_percpu(struct linux_quic_mib);
+	if (!net->mib.quic_statistics)
+		return -ENOMEM;
+
+	err = quic_proc_init(net);
+	if (err)
+		goto err_free_stats;
+
+	return 0;
+
+err_free_stats:
+	free_percpu(net->mib.quic_statistics);
+	return err;
+}
+
+static void __net_exit quic_exit_net(struct net *net)
+{
+	quic_proc_fini(net);
+	free_percpu(net->mib.quic_statistics);
+}
+
+static struct pernet_operations quic_proc_ops = {
+	.init = quic_init_net,
+	.exit = quic_exit_net,
+};
+
 static struct udp_ulp_ops quic_ulp_ops __read_mostly = {
 	.name		= "quic-crypto",
 	.owner		= THIS_MODULE,
@@ -1354,6 +1393,12 @@ static struct udp_ulp_ops quic_ulp_ops __read_mostly = {
 
 static int __init quic_register(void)
 {
+	int err;
+
+	err = register_pernet_subsys(&quic_proc_ops);
+	if (err)
+		return err;
+
 	udp_register_ulp(&quic_ulp_ops);
 	return 0;
 }
@@ -1361,6 +1406,7 @@ static int __init quic_register(void)
 static void __exit quic_unregister(void)
 {
 	udp_unregister_ulp(&quic_ulp_ops);
+	unregister_pernet_subsys(&quic_proc_ops);
 }
 
 module_init(quic_register);
diff --git a/net/quic/quic_proc.c b/net/quic/quic_proc.c
new file mode 100644
index 000000000000..cb4fe7a589b5
--- /dev/null
+++ b/net/quic/quic_proc.c
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+/* Copyright (C) 2019 Meta Platforms, Inc. */
+
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <net/snmp.h>
+#include <net/quic.h>
+
+#ifdef CONFIG_PROC_FS
+static const struct snmp_mib quic_mib_list[] = {
+	SNMP_MIB_ITEM("QuicCurrTxSw", LINUX_MIB_QUICCURRTXSW),
+	SNMP_MIB_ITEM("QuicTxSw", LINUX_MIB_QUICTXSW),
+	SNMP_MIB_ITEM("QuicTxSwError", LINUX_MIB_QUICTXSWERROR),
+	SNMP_MIB_SENTINEL
+};
+
+static int quic_statistics_seq_show(struct seq_file *seq, void *v)
+{
+	unsigned long buf[LINUX_MIB_QUICMAX] = {};
+	struct net *net = seq->private;
+	int i;
+
+	snmp_get_cpu_field_batch(buf, quic_mib_list, net->mib.quic_statistics);
+	for (i = 0; quic_mib_list[i].name; i++)
+		seq_printf(seq, "%-32s\t%lu\n", quic_mib_list[i].name, buf[i]);
+
+	return 0;
+}
+#endif
+
+int __net_init quic_proc_init(struct net *net)
+{
+#ifdef CONFIG_PROC_FS
+	if (!proc_create_net_single("quic_stat", 0444, net->proc_net,
+				    quic_statistics_seq_show, NULL))
+		return -ENOMEM;
+#endif /* CONFIG_PROC_FS */
+
+	return 0;
+}
+
+void __net_exit quic_proc_fini(struct net *net)
+{
+	remove_proc_entry("quic_stat", net->proc_net);
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [net-next 6/6] Add self tests for ULP operations, flow setup and crypto tests
  2022-08-16 18:11 ` [net-next 0/6] net: support QUIC crypto Adel Abouchaev
                     ` (4 preceding siblings ...)
  2022-08-16 18:11   ` [net-next 5/6] Add flow counters and Tx processing error counter Adel Abouchaev
@ 2022-08-16 18:11   ` Adel Abouchaev
  2022-08-17  8:09   ` [net-next 0/6] net: support QUIC crypto Bagas Sanjaya
  6 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-16 18:11 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

Add self tests for ULP operations, flow setup and crypto tests.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>

---

Restored the test build. Changed the QUIC context reference variable
names for the keys and iv to match the uAPI.

Updated alignment, added SPDX license line.

v3: Added Chacha20-Poly1305 test.
---
 tools/testing/selftests/net/.gitignore |    3 +-
 tools/testing/selftests/net/Makefile   |    3 +-
 tools/testing/selftests/net/quic.c     | 1153 ++++++++++++++++++++++++
 tools/testing/selftests/net/quic.sh    |   46 +
 4 files changed, 1203 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/net/quic.c
 create mode 100755 tools/testing/selftests/net/quic.sh

diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore
index 892306bdb47d..134b50f2ceb9 100644
--- a/tools/testing/selftests/net/.gitignore
+++ b/tools/testing/selftests/net/.gitignore
@@ -38,4 +38,5 @@ ioam6_parser
 toeplitz
 tun
 cmsg_sender
-unix_connect
\ No newline at end of file
+unix_connect
+quic
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index e2dfef8b78a7..e107efc84baf 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -42,6 +42,7 @@ TEST_PROGS += arp_ndisc_evict_nocarrier.sh
 TEST_PROGS += ndisc_unsolicited_na_test.sh
 TEST_PROGS += arp_ndisc_untracked_subnets.sh
 TEST_PROGS += stress_reuseport_listen.sh
+TEST_PROGS += quic.sh
 TEST_PROGS_EXTENDED := in_netns.sh setup_loopback.sh setup_veth.sh
 TEST_PROGS_EXTENDED += toeplitz_client.sh toeplitz.sh
 TEST_GEN_FILES =  socket nettest
@@ -57,7 +58,7 @@ TEST_GEN_FILES += ipsec
 TEST_GEN_FILES += ioam6_parser
 TEST_GEN_FILES += gro
 TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
-TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls tun
+TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls tun quic
 TEST_GEN_FILES += toeplitz
 TEST_GEN_FILES += cmsg_sender
 TEST_GEN_FILES += stress_reuseport_listen
diff --git a/tools/testing/selftests/net/quic.c b/tools/testing/selftests/net/quic.c
new file mode 100644
index 000000000000..2aa5e1564f5f
--- /dev/null
+++ b/tools/testing/selftests/net/quic.c
@@ -0,0 +1,1153 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include <arpa/inet.h>
+#include <errno.h>
+#include <error.h>
+#include <fcntl.h>
+#include <poll.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <linux/limits.h>
+#include <linux/quic.h>
+#include <linux/socket.h>
+#include <linux/tls.h>
+#include <linux/tcp.h>
+#include <linux/types.h>
+#include <linux/udp.h>
+
+#include <sys/ioctl.h>
+#include <sys/types.h>
+#include <sys/sendfile.h>
+#include <sys/socket.h>
+#include <sys/stat.h>
+
+#include "../kselftest_harness.h"
+
+#define UDP_ULP		105
+
+#ifndef SOL_UDP
+#define SOL_UDP		17
+#endif
+
+// 1. QUIC ULP Registration Test
+
+FIXTURE(quic_ulp)
+{
+	int sfd;
+	socklen_t len_s;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} server;
+	int default_net_ns_fd;
+	int server_net_ns_fd;
+};
+
+FIXTURE_VARIANT(quic_ulp)
+{
+	unsigned int af_server;
+	char *server_address;
+	unsigned short server_port;
+};
+
+FIXTURE_VARIANT_ADD(quic_ulp, ipv4)
+{
+	.af_server = AF_INET,
+	.server_address = "10.0.0.2",
+	.server_port = 7101,
+};
+
+FIXTURE_VARIANT_ADD(quic_ulp, ipv6)
+{
+	.af_server = AF_INET6,
+	.server_address = "2001::2",
+	.server_port = 7102,
+};
+
+FIXTURE_SETUP(quic_ulp)
+{
+	char path[PATH_MAX];
+	int optval = 1;
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/net", getpid());
+	self->default_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->default_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns2");
+	self->server_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->server_net_ns_fd, 0);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	self->sfd = socket(variant->af_server, SOCK_DGRAM, 0);
+	ASSERT_NE(setsockopt(self->sfd, SOL_SOCKET, SO_REUSEPORT, &optval,
+			     sizeof(optval)), -1);
+	if (variant->af_server == AF_INET) {
+		self->len_s = sizeof(self->server.addr);
+		self->server.addr.sin_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr.sin_addr);
+		self->server.addr.sin_port = htons(variant->server_port);
+		ASSERT_EQ(bind(self->sfd, &self->server.addr, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr,
+				      &self->len_s), 0);
+	} else {
+		self->len_s = sizeof(self->server.addr6);
+		self->server.addr6.sin6_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr6.sin6_addr);
+		self->server.addr6.sin6_port = htons(variant->server_port);
+		ASSERT_EQ(bind(self->sfd, &self->server.addr6, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr6,
+				      &self->len_s), 0);
+	}
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+};
+
+FIXTURE_TEARDOWN(quic_ulp)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	close(self->sfd);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+};
+
+TEST_F(quic_ulp, request_nonexistent_udp_ulp)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_ULP,
+			     "nonexistent", sizeof("nonexistent")), -1);
+	// If UDP_ULP option is not present, the error would be ENOPROTOOPT.
+	ASSERT_EQ(errno, ENOENT);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+};
+
+TEST_F(quic_ulp, request_quic_crypto_udp_ulp)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_ULP,
+			     "quic-crypto", sizeof("quic-crypto")), 0);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+};
+
+// 2. QUIC Data Path Operation Tests
+
+#define DO_NOT_SETUP_FLOW 0
+#define SETUP_FLOW 1
+
+#define DO_NOT_USE_CLIENT 0
+#define USE_CLIENT 1
+
+FIXTURE(quic_data)
+{
+	int sfd, c1fd, c2fd;
+	socklen_t len_c1;
+	socklen_t len_c2;
+	socklen_t len_s;
+
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} client_1;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} client_2;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} server;
+	int default_net_ns_fd;
+	int client_1_net_ns_fd;
+	int client_2_net_ns_fd;
+	int server_net_ns_fd;
+};
+
+FIXTURE_VARIANT(quic_data)
+{
+	unsigned int af_client_1;
+	char *client_1_address;
+	unsigned short client_1_port;
+	uint8_t conn_id_1[8];
+	uint8_t conn_1_key[16];
+	uint8_t conn_1_iv[12];
+	uint8_t conn_1_hdr_key[16];
+	size_t conn_id_1_len;
+	bool setup_flow_1;
+	bool use_client_1;
+	unsigned int af_client_2;
+	char *client_2_address;
+	unsigned short client_2_port;
+	uint8_t conn_id_2[8];
+	uint8_t conn_2_key[16];
+	uint8_t conn_2_iv[12];
+	uint8_t conn_2_hdr_key[16];
+	size_t conn_id_2_len;
+	bool setup_flow_2;
+	bool use_client_2;
+	unsigned int af_server;
+	char *server_address;
+	unsigned short server_port;
+};
+
+FIXTURE_VARIANT_ADD(quic_data, ipv4)
+{
+	.af_client_1 = AF_INET,
+	.client_1_address = "10.0.0.1",
+	.client_1_port = 6667,
+	.conn_id_1 = {0x11, 0x12, 0x13, 0x14},
+	.conn_id_1_len = 4,
+	.setup_flow_1 = SETUP_FLOW,
+	.use_client_1 = USE_CLIENT,
+	.af_client_2 = AF_INET,
+	.client_2_address = "10.0.0.3",
+	.client_2_port = 6668,
+	.conn_id_2 = {0x21, 0x22, 0x23, 0x24},
+	.conn_id_2_len = 4,
+	.setup_flow_2 = SETUP_FLOW,
+	//.use_client_2 = USE_CLIENT,
+	.af_server = AF_INET,
+	.server_address = "10.0.0.2",
+	.server_port = 6669,
+};
+
+FIXTURE_VARIANT_ADD(quic_data, ipv6_mapped_ipv4_two_conns)
+{
+	.af_client_1 = AF_INET6,
+	.client_1_address = "::ffff:10.0.0.1",
+	.client_1_port = 6670,
+	.conn_id_1 = {0x11, 0x12, 0x13, 0x14},
+	.conn_id_1_len = 4,
+	.setup_flow_1 = SETUP_FLOW,
+	.use_client_1 = USE_CLIENT,
+	.af_client_2 = AF_INET6,
+	.client_2_address = "::ffff:10.0.0.3",
+	.client_2_port = 6671,
+	.conn_id_2 = {0x21, 0x22, 0x23, 0x24},
+	.conn_id_2_len = 4,
+	.setup_flow_2 = SETUP_FLOW,
+	.use_client_2 = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "::ffff:10.0.0.2",
+	.server_port = 6672,
+};
+
+FIXTURE_VARIANT_ADD(quic_data, ipv6_mapped_ipv4_setup_ipv4_one_conn)
+{
+	.af_client_1 = AF_INET,
+	.client_1_address = "10.0.0.3",
+	.client_1_port = 6676,
+	.conn_id_1 = {0x11, 0x12, 0x13, 0x14},
+	.conn_id_1_len = 4,
+	.setup_flow_1 = SETUP_FLOW,
+	.use_client_1 = DO_NOT_USE_CLIENT,
+	.af_client_2 = AF_INET6,
+	.client_2_address = "::ffff:10.0.0.3",
+	.client_2_port = 6676,
+	.conn_id_2 = {0x11, 0x12, 0x13, 0x14},
+	.conn_id_2_len = 4,
+	.setup_flow_2 = DO_NOT_SETUP_FLOW,
+	.use_client_2 = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "::ffff:10.0.0.2",
+	.server_port = 6677,
+};
+
+FIXTURE_VARIANT_ADD(quic_data, ipv6_mapped_ipv4_setup_ipv6_one_conn)
+{
+	.af_client_1 = AF_INET6,
+	.client_1_address = "::ffff:10.0.0.3",
+	.client_1_port = 6678,
+	.conn_id_1 = {0x11, 0x12, 0x13, 0x14},
+	.setup_flow_1 = SETUP_FLOW,
+	.use_client_1 = DO_NOT_USE_CLIENT,
+	.af_client_2 = AF_INET,
+	.client_2_address = "10.0.0.3",
+	.client_2_port = 6678,
+	.conn_id_2 = {0x11, 0x12, 0x13, 0x14},
+	.setup_flow_2 = DO_NOT_SETUP_FLOW,
+	.use_client_2 = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "::ffff:10.0.0.2",
+	.server_port = 6679,
+};
+
+FIXTURE_SETUP(quic_data)
+{
+	char path[PATH_MAX];
+	int optval = 1;
+
+	if (variant->af_client_1 == AF_INET) {
+		self->len_c1 = sizeof(self->client_1.addr);
+		self->client_1.addr.sin_family = variant->af_client_1;
+		inet_pton(variant->af_client_1, variant->client_1_address,
+			  &self->client_1.addr.sin_addr);
+		self->client_1.addr.sin_port = htons(variant->client_1_port);
+	} else {
+		self->len_c1 = sizeof(self->client_1.addr6);
+		self->client_1.addr6.sin6_family = variant->af_client_1;
+		inet_pton(variant->af_client_1, variant->client_1_address,
+			  &self->client_1.addr6.sin6_addr);
+		self->client_1.addr6.sin6_port = htons(variant->client_1_port);
+	}
+
+	if (variant->af_client_2 == AF_INET) {
+		self->len_c2 = sizeof(self->client_2.addr);
+		self->client_2.addr.sin_family = variant->af_client_2;
+		inet_pton(variant->af_client_2, variant->client_2_address,
+			  &self->client_2.addr.sin_addr);
+		self->client_2.addr.sin_port = htons(variant->client_2_port);
+	} else {
+		self->len_c2 = sizeof(self->client_2.addr6);
+		self->client_2.addr6.sin6_family = variant->af_client_2;
+		inet_pton(variant->af_client_2, variant->client_2_address,
+			  &self->client_2.addr6.sin6_addr);
+		self->client_2.addr6.sin6_port = htons(variant->client_2_port);
+	}
+
+	if (variant->af_server == AF_INET) {
+		self->len_s = sizeof(self->server.addr);
+		self->server.addr.sin_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr.sin_addr);
+		self->server.addr.sin_port = htons(variant->server_port);
+	} else {
+		self->len_s = sizeof(self->server.addr6);
+		self->server.addr6.sin6_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr6.sin6_addr);
+		self->server.addr6.sin6_port = htons(variant->server_port);
+	}
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/net", getpid());
+	self->default_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->default_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns11");
+	self->client_1_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->client_1_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns12");
+	self->client_2_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->client_2_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns2");
+	self->server_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->server_net_ns_fd, 0);
+
+	if (variant->use_client_1) {
+		ASSERT_NE(setns(self->client_1_net_ns_fd, 0), -1);
+		self->c1fd = socket(variant->af_client_1, SOCK_DGRAM, 0);
+		ASSERT_NE(setsockopt(self->c1fd, SOL_SOCKET, SO_REUSEPORT,
+				     &optval, sizeof(optval)), -1);
+		if (variant->af_client_1 == AF_INET) {
+			ASSERT_EQ(bind(self->c1fd, &self->client_1.addr,
+				       self->len_c1), 0);
+			ASSERT_EQ(getsockname(self->c1fd, &self->client_1.addr,
+					      &self->len_c1), 0);
+		} else {
+			ASSERT_EQ(bind(self->c1fd, &self->client_1.addr6,
+				       self->len_c1), 0);
+			ASSERT_EQ(getsockname(self->c1fd, &self->client_1.addr6,
+					      &self->len_c1), 0);
+		}
+	}
+
+	if (variant->use_client_2) {
+		ASSERT_NE(setns(self->client_2_net_ns_fd, 0), -1);
+		self->c2fd = socket(variant->af_client_2, SOCK_DGRAM, 0);
+		ASSERT_NE(setsockopt(self->c2fd, SOL_SOCKET, SO_REUSEPORT,
+				     &optval, sizeof(optval)), -1);
+		if (variant->af_client_2 == AF_INET) {
+			ASSERT_EQ(bind(self->c2fd, &self->client_2.addr,
+				       self->len_c2), 0);
+			ASSERT_EQ(getsockname(self->c2fd, &self->client_2.addr,
+					      &self->len_c2), 0);
+		} else {
+			ASSERT_EQ(bind(self->c2fd, &self->client_2.addr6,
+				       self->len_c2), 0);
+			ASSERT_EQ(getsockname(self->c2fd, &self->client_2.addr6,
+					      &self->len_c2), 0);
+		}
+	}
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	self->sfd = socket(variant->af_server, SOCK_DGRAM, 0);
+	ASSERT_NE(setsockopt(self->sfd, SOL_SOCKET, SO_REUSEPORT, &optval,
+			     sizeof(optval)), -1);
+	if (variant->af_server == AF_INET) {
+		ASSERT_EQ(bind(self->sfd, &self->server.addr, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr,
+				      &self->len_s), 0);
+	} else {
+		ASSERT_EQ(bind(self->sfd, &self->server.addr6, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr6,
+				      &self->len_s), 0);
+	}
+
+	ASSERT_EQ(setsockopt(self->sfd, IPPROTO_UDP, UDP_ULP,
+			     "quic-crypto", sizeof("quic-crypto")), 0);
+
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+FIXTURE_TEARDOWN(quic_data)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	close(self->sfd);
+	ASSERT_NE(setns(self->client_1_net_ns_fd, 0), -1);
+	close(self->c1fd);
+	ASSERT_NE(setns(self->client_2_net_ns_fd, 0), -1);
+	close(self->c2fd);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+TEST_F(quic_data, send_fail_no_flow)
+{
+	char const *test_str = "test_read";
+	int send_len = 10;
+
+	ASSERT_EQ(strlen(test_str) + 1, send_len);
+	EXPECT_EQ(sendto(self->sfd, test_str, send_len, 0,
+			 &self->client_1.addr, self->len_c1), -1);
+};
+
+TEST_F(quic_data, encrypt_two_conn_gso_1200_iov_2_size_9000_aesgcm128)
+{
+	size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
+	uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)];
+	struct quic_connection_info conn_1_info;
+	struct quic_connection_info conn_2_info;
+	struct quic_tx_ancillary_data *anc_data;
+	socklen_t recv_addr_len_1;
+	socklen_t recv_addr_len_2;
+	struct cmsghdr *cmsg_hdr;
+	int frag_size = 1200;
+	int send_len = 9000;
+	struct iovec iov[2];
+	int msg_len = 4500;
+	struct msghdr msg;
+	char *test_str_1;
+	char *test_str_2;
+	char *buf_1;
+	char *buf_2;
+	int i;
+
+	test_str_1 = (char *)malloc(9000);
+	test_str_2 = (char *)malloc(9000);
+	memset(test_str_1, 0, 9000);
+	memset(test_str_2, 0, 9000);
+
+	buf_1 = (char *)malloc(10000);
+	buf_2 = (char *)malloc(10000);
+	for (i = 0; i < 9000; i += (1200 - 16)) {
+		test_str_1[i] = 0x40;
+		memcpy(&test_str_1[i + 1], &variant->conn_id_1,
+		       variant->conn_id_1_len);
+		test_str_1[i + 1 + variant->conn_id_1_len] = 0xca;
+
+		test_str_2[i] = 0x40;
+		memcpy(&test_str_2[i + 1], &variant->conn_id_2,
+		       variant->conn_id_2_len);
+		test_str_2[i + 1 + variant->conn_id_2_len] = 0xca;
+	}
+
+	// program the connection into the offload
+	conn_1_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	memset(&conn_1_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_1_info.key.conn_id_length = variant->conn_id_1_len;
+	memcpy(conn_1_info.key.conn_id,
+	       &variant->conn_id_1,
+	       variant->conn_id_1_len);
+
+	conn_2_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	memset(&conn_2_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_2_info.key.conn_id_length = variant->conn_id_2_len;
+	memcpy(conn_2_info.key.conn_id,
+	       &variant->conn_id_2,
+	       variant->conn_id_2_len);
+
+	memcpy(&conn_1_info.aes_gcm_128.payload_key,
+	       &variant->conn_1_key, 16);
+	memcpy(&conn_1_info.aes_gcm_128.payload_iv,
+	       &variant->conn_1_iv, 12);
+	memcpy(&conn_1_info.aes_gcm_128.header_key,
+	       &variant->conn_1_hdr_key, 16);
+	memcpy(&conn_2_info.aes_gcm_128.payload_key,
+	       &variant->conn_2_key, 16);
+	memcpy(&conn_2_info.aes_gcm_128.payload_iv,
+	       &variant->conn_2_iv, 12);
+	memcpy(&conn_2_info.aes_gcm_128.header_key,
+	       &variant->conn_2_hdr_key,
+	       16);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
+			     sizeof(frag_size)), 0);
+
+	if (variant->setup_flow_1)
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_ADD_TX_CONNECTION,
+				     &conn_1_info, sizeof(conn_1_info)), 0);
+
+	if (variant->setup_flow_2)
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_ADD_TX_CONNECTION,
+				     &conn_2_info, sizeof(conn_2_info)), 0);
+
+	recv_addr_len_1 = self->len_c1;
+	recv_addr_len_2 = self->len_c2;
+
+	iov[0].iov_base = test_str_1;
+	iov[0].iov_len = msg_len;
+	iov[1].iov_base = (void *)test_str_1 + 4500;
+	iov[1].iov_len = msg_len;
+
+	msg.msg_name = (self->client_1.addr.sin_family == AF_INET)
+		       ? (void *)&self->client_1.addr
+		       : (void *)&self->client_1.addr6;
+	msg.msg_namelen = self->len_c1;
+	msg.msg_iov = iov;
+	msg.msg_iovlen = 2;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->flags = 0;
+	anc_data->conn_id_length = variant->conn_id_1_len;
+
+	if (variant->use_client_1)
+		EXPECT_EQ(sendmsg(self->sfd, &msg, 0), send_len);
+
+	iov[0].iov_base = test_str_2;
+	iov[0].iov_len = msg_len;
+	iov[1].iov_base = (void *)test_str_2 + 4500;
+	iov[1].iov_len = msg_len;
+	msg.msg_name = (self->client_2.addr.sin_family == AF_INET)
+		       ? (void *)&self->client_2.addr
+		       : (void *)&self->client_2.addr6;
+	msg.msg_namelen = self->len_c2;
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->conn_id_length = variant->conn_id_2_len;
+	anc_data->flags = 0;
+
+	if (variant->use_client_2)
+		EXPECT_EQ(sendmsg(self->sfd, &msg, 0), send_len);
+
+	if (variant->use_client_1) {
+		ASSERT_NE(setns(self->client_1_net_ns_fd, 0), -1);
+		if (variant->af_client_1 == AF_INET) {
+			for (i = 0; i < 7; ++i) {
+				EXPECT_EQ(recvfrom(self->c1fd, buf_1, 9000, 0,
+						   &self->client_1.addr,
+						   &recv_addr_len_1),
+					  1200);
+				// Validate framing is intact.
+				EXPECT_EQ(memcmp((void *)buf_1 + 1,
+						 &variant->conn_id_1,
+						 variant->conn_id_1_len), 0);
+			}
+			EXPECT_EQ(recvfrom(self->c1fd, buf_1, 9000, 0,
+					   &self->client_1.addr,
+					   &recv_addr_len_1),
+				  728);
+			EXPECT_EQ(memcmp((void *)buf_1 + 1,
+					 &variant->conn_id_1,
+					 variant->conn_id_1_len), 0);
+		} else {
+			for (i = 0; i < 7; ++i) {
+				EXPECT_EQ(recvfrom(self->c1fd, buf_1, 9000, 0,
+						   &self->client_1.addr6,
+						   &recv_addr_len_1),
+					1200);
+			}
+			EXPECT_EQ(recvfrom(self->c1fd, buf_1, 9000, 0,
+					   &self->client_1.addr6,
+					   &recv_addr_len_1),
+				  728);
+			EXPECT_EQ(memcmp((void *)buf_1 + 1,
+					 &variant->conn_id_1,
+					 variant->conn_id_1_len), 0);
+		}
+		EXPECT_NE(memcmp(buf_1, test_str_1, send_len), 0);
+	}
+
+	if (variant->use_client_2) {
+		ASSERT_NE(setns(self->client_2_net_ns_fd, 0), -1);
+		if (variant->af_client_2 == AF_INET) {
+			for (i = 0; i < 7; ++i) {
+				EXPECT_EQ(recvfrom(self->c2fd, buf_2, 9000, 0,
+						   &self->client_2.addr,
+						   &recv_addr_len_2),
+					  1200);
+				EXPECT_EQ(memcmp((void *)buf_2 + 1,
+						 &variant->conn_id_2,
+						 variant->conn_id_2_len), 0);
+			}
+			EXPECT_EQ(recvfrom(self->c2fd, buf_2, 9000, 0,
+					   &self->client_2.addr,
+					   &recv_addr_len_2),
+				  728);
+			EXPECT_EQ(memcmp((void *)buf_2 + 1,
+					 &variant->conn_id_2,
+					 variant->conn_id_2_len), 0);
+		} else {
+			for (i = 0; i < 7; ++i) {
+				EXPECT_EQ(recvfrom(self->c2fd, buf_2, 9000, 0,
+						   &self->client_2.addr6,
+						   &recv_addr_len_2),
+					  1200);
+				EXPECT_EQ(memcmp((void *)buf_2 + 1,
+						 &variant->conn_id_2,
+						 variant->conn_id_2_len), 0);
+			}
+			EXPECT_EQ(recvfrom(self->c2fd, buf_2, 9000, 0,
+					   &self->client_2.addr6,
+					   &recv_addr_len_2),
+				  728);
+			EXPECT_EQ(memcmp((void *)buf_2 + 1,
+					 &variant->conn_id_2,
+					 variant->conn_id_2_len), 0);
+		}
+		EXPECT_NE(memcmp(buf_2, test_str_2, send_len), 0);
+	}
+
+	if (variant->use_client_1 && variant->use_client_2)
+		EXPECT_NE(memcmp(buf_1, buf_2, send_len), 0);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	if (variant->setup_flow_1) {
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_DEL_TX_CONNECTION,
+				     &conn_1_info, sizeof(conn_1_info)),
+			  0);
+	}
+	if (variant->setup_flow_2) {
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_DEL_TX_CONNECTION,
+				     &conn_2_info, sizeof(conn_2_info)),
+			  0);
+	}
+	free(test_str_1);
+	free(test_str_2);
+	free(buf_1);
+	free(buf_2);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+// 3. QUIC Encryption Tests
+
+FIXTURE(quic_crypto)
+{
+	int sfd, cfd;
+	socklen_t len_c;
+	socklen_t len_s;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} client;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} server;
+	int default_net_ns_fd;
+	int client_net_ns_fd;
+	int server_net_ns_fd;
+};
+
+FIXTURE_VARIANT(quic_crypto)
+{
+	unsigned int af_client;
+	char *client_address;
+	unsigned short client_port;
+	uint32_t algo;
+	size_t conn_key_len;
+	uint8_t conn_id[8];
+	union {
+		uint8_t conn_key_16[16];
+		uint8_t conn_key_32[32];
+	} conn_key;
+	uint8_t conn_iv[12];
+	union {
+		uint8_t conn_hdr_key_16[16];
+		uint8_t conn_hdr_key_32[32];
+	} conn_hdr_key;
+	size_t conn_id_len;
+	bool setup_flow;
+	bool use_client;
+	unsigned int af_server;
+	char *server_address;
+	unsigned short server_port;
+	char plain[128];
+	size_t plain_len;
+	char match[128];
+	size_t match_len;
+	uint32_t next_pkt_num;
+};
+
+FIXTURE_SETUP(quic_crypto)
+{
+	char path[PATH_MAX];
+	int optval = 1;
+
+	if (variant->af_client == AF_INET) {
+		self->len_c = sizeof(self->client.addr);
+		self->client.addr.sin_family = variant->af_client;
+		inet_pton(variant->af_client, variant->client_address,
+			  &self->client.addr.sin_addr);
+		self->client.addr.sin_port = htons(variant->client_port);
+	} else {
+		self->len_c = sizeof(self->client.addr6);
+		self->client.addr6.sin6_family = variant->af_client;
+		inet_pton(variant->af_client, variant->client_address,
+			  &self->client.addr6.sin6_addr);
+		self->client.addr6.sin6_port = htons(variant->client_port);
+	}
+
+	if (variant->af_server == AF_INET) {
+		self->len_s = sizeof(self->server.addr);
+		self->server.addr.sin_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr.sin_addr);
+		self->server.addr.sin_port = htons(variant->server_port);
+	} else {
+		self->len_s = sizeof(self->server.addr6);
+		self->server.addr6.sin6_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr6.sin6_addr);
+		self->server.addr6.sin6_port = htons(variant->server_port);
+	}
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/net", getpid());
+	self->default_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->default_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns11");
+	self->client_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->client_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns2");
+	self->server_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->server_net_ns_fd, 0);
+
+	if (variant->use_client) {
+		ASSERT_NE(setns(self->client_net_ns_fd, 0), -1);
+		self->cfd = socket(variant->af_client, SOCK_DGRAM, 0);
+		ASSERT_NE(setsockopt(self->cfd, SOL_SOCKET, SO_REUSEPORT,
+				     &optval, sizeof(optval)), -1);
+		if (variant->af_client == AF_INET) {
+			ASSERT_EQ(bind(self->cfd, &self->client.addr,
+				       self->len_c), 0);
+			ASSERT_EQ(getsockname(self->cfd, &self->client.addr,
+					      &self->len_c), 0);
+		} else {
+			ASSERT_EQ(bind(self->cfd, &self->client.addr6,
+				       self->len_c), 0);
+			ASSERT_EQ(getsockname(self->cfd, &self->client.addr6,
+					      &self->len_c), 0);
+		}
+	}
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	self->sfd = socket(variant->af_server, SOCK_DGRAM, 0);
+	ASSERT_NE(setsockopt(self->sfd, SOL_SOCKET, SO_REUSEPORT, &optval,
+			     sizeof(optval)), -1);
+	if (variant->af_server == AF_INET) {
+		ASSERT_EQ(bind(self->sfd, &self->server.addr, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr,
+				      &self->len_s),
+			  0);
+	} else {
+		ASSERT_EQ(bind(self->sfd, &self->server.addr6, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr6,
+				      &self->len_s),
+			  0);
+	}
+
+	ASSERT_EQ(setsockopt(self->sfd, IPPROTO_UDP, UDP_ULP,
+			     "quic-crypto", sizeof("quic-crypto")), 0);
+
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+FIXTURE_TEARDOWN(quic_crypto)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	close(self->sfd);
+	ASSERT_NE(setns(self->client_net_ns_fd, 0), -1);
+	close(self->cfd);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+FIXTURE_VARIANT_ADD(quic_crypto, ipv4_aes_gcm_128)
+{
+	.af_client = AF_INET,
+	.client_address = "10.0.0.1",
+	.client_port = 7667,
+	.algo = TLS_CIPHER_AES_GCM_128,
+	.conn_key_len = 16,
+	.conn_id = {0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12, 0x49},
+	.conn_key = {
+		.conn_key_16 = {0x87, 0x71, 0xea, 0x1d,
+				0xfb, 0xbe, 0x7a, 0x45,
+				0xbb, 0xe2, 0x7e, 0xbc,
+				0x0b, 0x53, 0x94, 0x99
+		},
+	},
+	.conn_iv = {0x3A, 0xA7, 0x46, 0x72, 0xE9, 0x83, 0x6B, 0x55, 0xDA,
+		0x66, 0x7B, 0xDA},
+	.conn_hdr_key = {
+		.conn_hdr_key_16 = {0xc9, 0x8e, 0xfd, 0xf2,
+				    0x0b, 0x64, 0x8c, 0x57,
+				    0xb5, 0x0a, 0xb2, 0xd2,
+				    0x21, 0xd3, 0x66, 0xa5},
+	},
+	.conn_id_len = 8,
+	.setup_flow = SETUP_FLOW,
+	.use_client = USE_CLIENT,
+	.af_server = AF_INET,
+	.server_address = "10.0.0.2",
+	.server_port = 7669,
+	.plain = { 0x40, 0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12,
+		   0x49, 0xca,
+		   // payload
+		   0x02, 0x80, 0xde, 0x40, 0x39, 0x40, 0xf6, 0x00,
+		   0x01, 0x0b, 0x00, 0x0f, 0x65, 0x63, 0x68, 0x6f,
+		   0x20, 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36,
+		   0x37, 0x38, 0x39
+	},
+	.plain_len = 37,
+	.match = {
+		   0x46, 0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12,
+		   0x49, 0x1c, 0x44, 0xb8, 0x41, 0xbb, 0xcf, 0x6e,
+		   0x0a, 0x2a, 0x24, 0xfb, 0xb4, 0x79, 0x62, 0xea,
+		   0x59, 0x38, 0x1a, 0x0e, 0x50, 0x1e, 0x59, 0xed,
+		   0x3f, 0x8e, 0x7e, 0x5a, 0x70, 0xe4, 0x2a, 0xbc,
+		   0x2a, 0xfa, 0x2b, 0x54, 0xeb, 0x89, 0xc3, 0x2c,
+		   0xb6, 0x8c, 0x1e, 0xab, 0x2d
+	},
+	.match_len = 53,
+	.next_pkt_num = 0x0d65c9,
+};
+
+FIXTURE_VARIANT_ADD(quic_crypto, ipv4_chacha20_poly1305)
+{
+	.af_client = AF_INET,
+	.client_address = "10.0.0.1",
+	.client_port = 7801,
+	.algo = TLS_CIPHER_CHACHA20_POLY1305,
+	.conn_key_len = 32,
+	.conn_id = {},
+	.conn_id_len = 0,
+	.conn_key = {
+		.conn_key_32 = {
+			0x3b, 0xfc, 0xdd, 0xd7, 0x2b, 0xcf, 0x02, 0x54,
+			0x1d, 0x7f, 0xa0, 0xdd, 0x1f, 0x5f, 0x9e, 0xee,
+			0xa8, 0x17, 0xe0, 0x9a, 0x69, 0x63, 0xa0, 0xe6,
+			0xc7, 0xdf, 0x0f, 0x9a, 0x1b, 0xab, 0x90, 0xf2,
+		},
+	},
+	.conn_iv = {
+		0xa6, 0xb5, 0xbc, 0x6a, 0xb7, 0xda, 0xfc, 0xe3,
+		0x0f, 0xff, 0xf5, 0xdd,
+	},
+	.conn_hdr_key = {
+		.conn_hdr_key_32 = {
+			0xd6, 0x59, 0x76, 0x0d, 0x2b, 0xa4, 0x34, 0xa2,
+			0x26, 0xfd, 0x37, 0xb3, 0x5c, 0x69, 0xe2, 0xda,
+			0x82, 0x11, 0xd1, 0x0c, 0x4f, 0x12, 0x53, 0x87,
+			0x87, 0xd6, 0x56, 0x45, 0xd5, 0xd1, 0xb8, 0xe2,
+		},
+	},
+	.setup_flow = SETUP_FLOW,
+	.use_client = USE_CLIENT,
+	.af_server = AF_INET,
+	.server_address = "10.0.0.2",
+	.server_port = 7802,
+	.plain = { 0x42, 0x00, 0xbf, 0xf4, 0x01 },
+	.plain_len = 5,
+	.match = { 0x55, 0x58, 0xb1, 0xc6, 0x0a, 0xe7, 0xb6, 0xb9,
+		   0x32, 0xbc, 0x27, 0xd7, 0x86, 0xf4, 0xbc, 0x2b,
+		   0xb2, 0x0f, 0x21, 0x62, 0xba },
+	.match_len = 21,
+	.next_pkt_num = 0x2700bff5,
+};
+
+FIXTURE_VARIANT_ADD(quic_crypto, ipv6_aes_gcm_128)
+{
+	.af_client = AF_INET6,
+	.client_address = "2001::1",
+	.client_port = 7673,
+	.algo = TLS_CIPHER_AES_GCM_128,
+	.conn_key_len = 16,
+	.conn_id = {0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12, 0x49},
+	.conn_key = {
+		.conn_key_16 = {0x87, 0x71, 0xea, 0x1d,
+				0xfb, 0xbe, 0x7a, 0x45,
+				0xbb, 0xe2, 0x7e, 0xbc,
+				0x0b, 0x53, 0x94, 0x99
+		},
+	},
+	.conn_iv = {0x3a, 0xa7, 0x46, 0x72, 0xe9, 0x83, 0x6b, 0x55, 0xda,
+		0x66, 0x7b, 0xda},
+	.conn_hdr_key = {
+		.conn_hdr_key_16 = {0xc9, 0x8e, 0xfd, 0xf2,
+				    0x0b, 0x64, 0x8c, 0x57,
+				    0xb5, 0x0a, 0xb2, 0xd2,
+				    0x21, 0xd3, 0x66, 0xa5},
+	},
+	.conn_id_len = 8,
+	.setup_flow = SETUP_FLOW,
+	.use_client = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "2001::2",
+	.server_port = 7675,
+	.plain = { 0x40, 0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12,
+		   0x49, 0xca,
+		   // Payload
+		   0x02, 0x80, 0xde, 0x40, 0x39, 0x40, 0xf6, 0x00,
+		   0x01, 0x0b, 0x00, 0x0f, 0x65, 0x63, 0x68, 0x6f,
+		   0x20, 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36,
+		   0x37, 0x38, 0x39
+	},
+	.plain_len = 37,
+	.match = {
+		   0x46, 0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12,
+		   0x49, 0x1c, 0x44, 0xb8, 0x41, 0xbb, 0xcf, 0x6e,
+		   0x0a, 0x2a, 0x24, 0xfb, 0xb4, 0x79, 0x62, 0xea,
+		   0x59, 0x38, 0x1a, 0x0e, 0x50, 0x1e, 0x59, 0xed,
+		   0x3f, 0x8e, 0x7e, 0x5a, 0x70, 0xe4, 0x2a, 0xbc,
+		   0x2a, 0xfa, 0x2b, 0x54, 0xeb, 0x89, 0xc3, 0x2c,
+		   0xb6, 0x8c, 0x1e, 0xab, 0x2d
+	},
+	.match_len = 53,
+	.next_pkt_num = 0x0d65c9,
+};
+
+FIXTURE_VARIANT_ADD(quic_crypto, ipv6_chacha20_poly1305)
+{
+	.af_client = AF_INET6,
+	.client_address = "2001::1",
+	.client_port = 7803,
+	.algo = TLS_CIPHER_CHACHA20_POLY1305,
+	.conn_key_len = 32,
+	.conn_id = {},
+	.conn_id_len = 0,
+	.conn_key = {
+		.conn_key_32 = {
+			0x3b, 0xfc, 0xdd, 0xd7, 0x2b, 0xcf, 0x02, 0x54,
+			0x1d, 0x7f, 0xa0, 0xdd, 0x1f, 0x5f, 0x9e, 0xee,
+			0xa8, 0x17, 0xe0, 0x9a, 0x69, 0x63, 0xa0, 0xe6,
+			0xc7, 0xdf, 0x0f, 0x9a, 0x1b, 0xab, 0x90, 0xf2,
+		},
+	},
+	.conn_iv = {
+		0xa6, 0xb5, 0xbc, 0x6a, 0xb7, 0xda, 0xfc, 0xe3,
+		0x0f, 0xff, 0xf5, 0xdd,
+	},
+	.conn_hdr_key = {
+		.conn_hdr_key_32 = {
+			0xd6, 0x59, 0x76, 0x0d, 0x2b, 0xa4, 0x34, 0xa2,
+			0x26, 0xfd, 0x37, 0xb3, 0x5c, 0x69, 0xe2, 0xda,
+			0x82, 0x11, 0xd1, 0x0c, 0x4f, 0x12, 0x53, 0x87,
+			0x87, 0xd6, 0x56, 0x45, 0xd5, 0xd1, 0xb8, 0xe2,
+		},
+	},
+	.setup_flow = SETUP_FLOW,
+	.use_client = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "2001::2",
+	.server_port = 7804,
+	.plain = { 0x42, 0x00, 0xbf, 0xf4, 0x01 },
+	.plain_len = 5,
+	.match = { 0x55, 0x58, 0xb1, 0xc6, 0x0a, 0xe7, 0xb6, 0xb9,
+		   0x32, 0xbc, 0x27, 0xd7, 0x86, 0xf4, 0xbc, 0x2b,
+		   0xb2, 0x0f, 0x21, 0x62, 0xba },
+	.match_len = 21,
+	.next_pkt_num = 0x2700bff5,
+};
+
+TEST_F(quic_crypto, encrypt_test_vector_single_flow_gso_in_control)
+{
+	uint8_t cmsg_buf[CMSG_SPACE(sizeof(struct quic_tx_ancillary_data))
+			 + CMSG_SPACE(sizeof(uint16_t))];
+	struct quic_tx_ancillary_data *anc_data;
+	struct quic_connection_info conn_info;
+	uint16_t frag_size = 1200;
+	struct cmsghdr *cmsg_hdr;
+	int wrong_frag_size = 26;
+	socklen_t recv_addr_len;
+	struct iovec iov;
+	struct msghdr msg;
+	char *buf;
+
+	buf = (char *)malloc(9000);
+	conn_info.cipher_type = variant->algo;
+	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_info.key.conn_id_length = variant->conn_id_len;
+	memcpy(conn_info.key.conn_id,
+	       &variant->conn_id,
+	       variant->conn_id_len);
+	ASSERT_TRUE(variant->algo == TLS_CIPHER_AES_GCM_128 ||
+		    variant->algo == TLS_CIPHER_CHACHA20_POLY1305);
+	switch (variant->algo) {
+	case TLS_CIPHER_AES_GCM_128:
+		memcpy(&conn_info.aes_gcm_128.payload_key,
+		       &variant->conn_key, 16);
+		memcpy(&conn_info.aes_gcm_128.payload_iv,
+		       &variant->conn_iv, 12);
+		memcpy(&conn_info.aes_gcm_128.header_key,
+		       &variant->conn_hdr_key, 16);
+		break;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		memcpy(&conn_info.chacha20_poly1305.payload_key,
+		       &variant->conn_key, 32);
+		memcpy(&conn_info.chacha20_poly1305.payload_iv,
+		       &variant->conn_iv, 12);
+		memcpy(&conn_info.chacha20_poly1305.header_key,
+		       &variant->conn_hdr_key, 32);
+		break;
+	}
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &wrong_frag_size,
+			     sizeof(wrong_frag_size)), 0);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION,
+			     &conn_info, sizeof(conn_info)), 0);
+
+	recv_addr_len = self->len_c;
+	iov.iov_base = (void *)variant->plain;
+	iov.iov_len = variant->plain_len;
+	memset(cmsg_buf, 0, sizeof(cmsg_buf));
+	msg.msg_name = (self->client.addr.sin_family == AF_INET)
+		       ? (void *)&self->client.addr
+		       : (void *)&self->client.addr6;
+	msg.msg_namelen = self->len_c;
+	msg.msg_iov = &iov;
+	msg.msg_iovlen = 1;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(sizeof(struct quic_tx_ancillary_data));
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->flags = 0;
+	anc_data->next_pkt_num = variant->next_pkt_num;
+	anc_data->conn_id_length = variant->conn_id_len;
+	cmsg_hdr = CMSG_NXTHDR(&msg, cmsg_hdr);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_SEGMENT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(sizeof(uint16_t));
+	memcpy(CMSG_DATA(cmsg_hdr), (void *)&frag_size, sizeof(frag_size));
+
+	EXPECT_EQ(sendmsg(self->sfd, &msg, 0), variant->plain_len);
+	ASSERT_NE(setns(self->client_net_ns_fd, 0), -1);
+	if (variant->af_client == AF_INET) {
+		EXPECT_EQ(recvfrom(self->cfd, buf, 9000, 0,
+				   &self->client.addr, &recv_addr_len),
+			  variant->match_len);
+	} else {
+		EXPECT_EQ(recvfrom(self->cfd, buf, 9000, 0,
+				   &self->client.addr6, &recv_addr_len),
+			  variant->match_len);
+	}
+	EXPECT_STREQ(buf, variant->match);
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION,
+			     &conn_info, sizeof(conn_info)), 0);
+	free(buf);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+TEST_F(quic_crypto, encrypt_test_vector_single_flow_gso_in_setsockopt)
+{
+	uint8_t cmsg_buf[CMSG_SPACE(sizeof(struct quic_tx_ancillary_data))];
+	struct quic_tx_ancillary_data *anc_data;
+	struct quic_connection_info conn_info;
+	int frag_size = 1200;
+	struct cmsghdr *cmsg_hdr;
+	socklen_t recv_addr_len;
+	struct iovec iov;
+	struct msghdr msg;
+	char *buf;
+
+	buf = (char *)malloc(9000);
+	conn_info.cipher_type = variant->algo;
+	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_info.key.conn_id_length = variant->conn_id_len;
+	memcpy(conn_info.key.conn_id,
+	       &variant->conn_id,
+	       variant->conn_id_len);
+	ASSERT_TRUE(variant->algo == TLS_CIPHER_AES_GCM_128 ||
+		    variant->algo == TLS_CIPHER_CHACHA20_POLY1305);
+	switch (variant->algo) {
+	case TLS_CIPHER_AES_GCM_128:
+		memcpy(&conn_info.aes_gcm_128.payload_key,
+		       &variant->conn_key, 16);
+		memcpy(&conn_info.aes_gcm_128.payload_iv,
+		       &variant->conn_iv, 12);
+		memcpy(&conn_info.aes_gcm_128.header_key,
+		       &variant->conn_hdr_key, 16);
+		break;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		memcpy(&conn_info.chacha20_poly1305.payload_key,
+		       &variant->conn_key, 32);
+		memcpy(&conn_info.chacha20_poly1305.payload_iv,
+		       &variant->conn_iv, 12);
+		memcpy(&conn_info.chacha20_poly1305.header_key,
+		       &variant->conn_hdr_key, 32);
+		break;
+	}
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
+			     sizeof(frag_size)), 0);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION,
+			     &conn_info, sizeof(conn_info)), 0);
+
+	recv_addr_len = self->len_c;
+	iov.iov_base = (void *)variant->plain;
+	iov.iov_len = variant->plain_len;
+	memset(cmsg_buf, 0, sizeof(cmsg_buf));
+	msg.msg_name = (self->client.addr.sin_family == AF_INET)
+		       ? (void *)&self->client.addr
+		       : (void *)&self->client.addr6;
+	msg.msg_namelen = self->len_c;
+	msg.msg_iov = &iov;
+	msg.msg_iovlen = 1;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(sizeof(struct quic_tx_ancillary_data));
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->flags = 0;
+	anc_data->next_pkt_num = variant->next_pkt_num;
+	anc_data->conn_id_length = variant->conn_id_len;
+
+	EXPECT_EQ(sendmsg(self->sfd, &msg, 0), variant->plain_len);
+	ASSERT_NE(setns(self->client_net_ns_fd, 0), -1);
+	if (variant->af_client == AF_INET) {
+		EXPECT_EQ(recvfrom(self->cfd, buf, 9000, 0,
+				   &self->client.addr, &recv_addr_len),
+			  variant->match_len);
+	} else {
+		EXPECT_EQ(recvfrom(self->cfd, buf, 9000, 0,
+				   &self->client.addr6, &recv_addr_len),
+			  variant->match_len);
+	}
+	EXPECT_STREQ(buf, variant->match);
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION,
+			     &conn_info, sizeof(conn_info)), 0);
+	free(buf);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/net/quic.sh b/tools/testing/selftests/net/quic.sh
new file mode 100755
index 000000000000..8ff8bc494671
--- /dev/null
+++ b/tools/testing/selftests/net/quic.sh
@@ -0,0 +1,46 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+sudo ip netns add ns11
+sudo ip netns add ns12
+sudo ip netns add ns2
+sudo ip link add veth11 type veth peer name br-veth11
+sudo ip link add veth12 type veth peer name br-veth12
+sudo ip link add veth2 type veth peer name br-veth2
+sudo ip link set veth11 netns ns11
+sudo ip link set veth12 netns ns12
+sudo ip link set veth2 netns ns2
+sudo ip netns exec ns11 ip addr add 10.0.0.1/24 dev veth11
+sudo ip netns exec ns11 ip addr add ::ffff:10.0.0.1/96 dev veth11
+sudo ip netns exec ns11 ip addr add 2001::1/64 dev veth11
+sudo ip netns exec ns12 ip addr add 10.0.0.3/24 dev veth12
+sudo ip netns exec ns12 ip addr add ::ffff:10.0.0.3/96 dev veth12
+sudo ip netns exec ns12 ip addr add 2001::3/64 dev veth12
+sudo ip netns exec ns2 ip addr add 10.0.0.2/24 dev veth2
+sudo ip netns exec ns2 ip addr add ::ffff:10.0.0.2/96 dev veth2
+sudo ip netns exec ns2 ip addr add 2001::2/64 dev veth2
+sudo ip link add name br1 type bridge forward_delay 0
+sudo ip link set br1 up
+sudo ip link set br-veth11 up
+sudo ip link set br-veth12 up
+sudo ip link set br-veth2 up
+sudo ip netns exec ns11 ip link set veth11 up
+sudo ip netns exec ns12 ip link set veth12 up
+sudo ip netns exec ns2 ip link set veth2 up
+sudo ip link set br-veth11 master br1
+sudo ip link set br-veth12 master br1
+sudo ip link set br-veth2 master br1
+sudo ip netns exec ns2 cat /proc/net/quic_stat
+
+printf "%s" "Waiting for bridge to start fowarding ..."
+while ! timeout 0.5 sudo ip netns exec ns2 ping -c 1 -n 2001::1 &> /dev/null
+do
+	printf "%c" "."
+done
+printf "\n%s\n"  "Bridge is operational"
+
+sudo ./quic
+sudo ip netns exec ns2 cat /proc/net/quic_stat
+sudo ip netns delete ns2
+sudo ip netns delete ns12
+sudo ip netns delete ns11
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [net-next 0/6] net: support QUIC crypto
  2022-08-16 18:11 ` [net-next 0/6] net: support QUIC crypto Adel Abouchaev
                     ` (5 preceding siblings ...)
  2022-08-16 18:11   ` [net-next 6/6] Add self tests for ULP operations, flow setup and crypto tests Adel Abouchaev
@ 2022-08-17  8:09   ` Bagas Sanjaya
  2022-08-17 18:49     ` Adel Abouchaev
  6 siblings, 1 reply; 77+ messages in thread
From: Bagas Sanjaya @ 2022-08-17  8:09 UTC (permalink / raw)
  To: Adel Abouchaev, kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

On 8/17/22 01:11, Adel Abouchaev wrote:
> QUIC requires end to end encryption of the data. The application usually
> prepares the data in clear text, encrypts and calls send() which implies
> multiple copies of the data before the packets hit the networking stack.
> Similar to kTLS, QUIC kernel offload of cryptography reduces the memory
> pressure by reducing the number of copies.
> 
> The scope of kernel support is limited to the symmetric cryptography,
> leaving the handshake to the user space library. For QUIC in particular,
> the application packets that require symmetric cryptography are the 1RTT
> packets with short headers. Kernel will encrypt the application packets
> on transmission and decrypt on receive. This series implements Tx only,
> because in QUIC server applications Tx outweighs Rx by orders of
> magnitude.
> 
> Supporting the combination of QUIC and GSO requires the application to
> correctly place the data and the kernel to correctly slice it. The
> encryption process appends an arbitrary number of bytes (tag) to the end
> of the message to authenticate it. The GSO value should include this
> overhead, the offload would then subtract the tag size to parse the
> input on Tx before chunking and encrypting it.
> 
> With the kernel cryptography, the buffer copy operation is conjoined
> with the encryption operation. The memory bandwidth is reduced by 5-8%.
> When devices supporting QUIC encryption in hardware come to the market,
> we will be able to free further 7% of CPU utilization which is used
> today for crypto operations.
> 

Hmmm...

I can't cleanly applied this series on top of current net-next. Exactly
on what commit this series is based on?

Also, I see two whitespace warnings when applying. Please fixup and resend.
When resending, don't forget to pass --base to git-format-patch(1).

Thanks.

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [net-next 0/6] net: support QUIC crypto
  2022-08-17  8:09   ` [net-next 0/6] net: support QUIC crypto Bagas Sanjaya
@ 2022-08-17 18:49     ` Adel Abouchaev
  0 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-17 18:49 UTC (permalink / raw)
  To: Bagas Sanjaya, kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

The base commit for the branch I am using here is:

commit f86d1fbbe7858884d6754534a0afbb74fc30bc26 
(origin/net-next-upstream, net-next/master, net-next/main, net-next)
Merge: 526942b8134c 7c6327c77d50
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Wed Aug 3 16:29:08 2022 -0700

Will fix the whitespaces and resubmit.

On 8/17/22 1:09 AM, Bagas Sanjaya wrote:
> On 8/17/22 01:11, Adel Abouchaev wrote:
>> QUIC requires end to end encryption of the data. The application usually
>> prepares the data in clear text, encrypts and calls send() which implies
>> multiple copies of the data before the packets hit the networking stack.
>> Similar to kTLS, QUIC kernel offload of cryptography reduces the memory
>> pressure by reducing the number of copies.
>>
>> The scope of kernel support is limited to the symmetric cryptography,
>> leaving the handshake to the user space library. For QUIC in particular,
>> the application packets that require symmetric cryptography are the 1RTT
>> packets with short headers. Kernel will encrypt the application packets
>> on transmission and decrypt on receive. This series implements Tx only,
>> because in QUIC server applications Tx outweighs Rx by orders of
>> magnitude.
>>
>> Supporting the combination of QUIC and GSO requires the application to
>> correctly place the data and the kernel to correctly slice it. The
>> encryption process appends an arbitrary number of bytes (tag) to the end
>> of the message to authenticate it. The GSO value should include this
>> overhead, the offload would then subtract the tag size to parse the
>> input on Tx before chunking and encrypting it.
>>
>> With the kernel cryptography, the buffer copy operation is conjoined
>> with the encryption operation. The memory bandwidth is reduced by 5-8%.
>> When devices supporting QUIC encryption in hardware come to the market,
>> we will be able to free further 7% of CPU utilization which is used
>> today for crypto operations.
>>
> Hmmm...
>
> I can't cleanly applied this series on top of current net-next. Exactly
> on what commit this series is based on?
>
> Also, I see two whitespace warnings when applying. Please fixup and resend.
> When resending, don't forget to pass --base to git-format-patch(1).
>
> Thanks.
>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [net-next v2 0/6] net: support QUIC crypto
       [not found] <Adel Abouchaev <adel.abushaev@gmail.com>
                   ` (3 preceding siblings ...)
  2022-08-16 18:11 ` [net-next 0/6] net: support QUIC crypto Adel Abouchaev
@ 2022-08-17 20:09 ` Adel Abouchaev
  2022-08-17 20:09   ` [net-next v2 1/6] Documentation on QUIC kernel Tx crypto Adel Abouchaev
                     ` (7 more replies)
  2022-08-24 18:43 ` [net-next] Fix reinitialization of TEST_PROGS in net self tests Adel Abouchaev
                   ` (2 subsequent siblings)
  7 siblings, 8 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-17 20:09 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

QUIC requires end to end encryption of the data. The application usually
prepares the data in clear text, encrypts and calls send() which implies
multiple copies of the data before the packets hit the networking stack.
Similar to kTLS, QUIC kernel offload of cryptography reduces the memory
pressure by reducing the number of copies.

The scope of kernel support is limited to the symmetric cryptography,
leaving the handshake to the user space library. For QUIC in particular,
the application packets that require symmetric cryptography are the 1RTT
packets with short headers. Kernel will encrypt the application packets
on transmission and decrypt on receive. This series implements Tx only,
because in QUIC server applications Tx outweighs Rx by orders of
magnitude.

Supporting the combination of QUIC and GSO requires the application to
correctly place the data and the kernel to correctly slice it. The
encryption process appends an arbitrary number of bytes (tag) to the end
of the message to authenticate it. The GSO value should include this
overhead, the offload would then subtract the tag size to parse the
input on Tx before chunking and encrypting it.

With the kernel cryptography, the buffer copy operation is conjoined
with the encryption operation. The memory bandwidth is reduced by 5-8%.
When devices supporting QUIC encryption in hardware come to the market,
we will be able to free further 7% of CPU utilization which is used
today for crypto operations.

Adel Abouchaev (6):
  Documentation on QUIC kernel Tx crypto.
  Define QUIC specific constants, control and data plane structures
  Add UDP ULP operations, initialization and handling prototype
    functions.
  Implement QUIC offload functions
  Add flow counters and Tx processing error counter
  Add self tests for ULP operations, flow setup and crypto tests

 Documentation/networking/index.rst     |    1 +
 Documentation/networking/quic.rst      |  185 ++++
 include/net/inet_sock.h                |    2 +
 include/net/netns/mib.h                |    3 +
 include/net/quic.h                     |   63 ++
 include/net/snmp.h                     |    6 +
 include/net/udp.h                      |   33 +
 include/uapi/linux/quic.h              |   60 +
 include/uapi/linux/snmp.h              |    9 +
 include/uapi/linux/udp.h               |    4 +
 net/Kconfig                            |    1 +
 net/Makefile                           |    1 +
 net/ipv4/Makefile                      |    3 +-
 net/ipv4/udp.c                         |   15 +
 net/ipv4/udp_ulp.c                     |  192 ++++
 net/quic/Kconfig                       |   16 +
 net/quic/Makefile                      |    8 +
 net/quic/quic_main.c                   | 1417 ++++++++++++++++++++++++
 net/quic/quic_proc.c                   |   45 +
 tools/testing/selftests/net/.gitignore |    4 +-
 tools/testing/selftests/net/Makefile   |    3 +-
 tools/testing/selftests/net/quic.c     | 1153 +++++++++++++++++++
 tools/testing/selftests/net/quic.sh    |   46 +
 23 files changed, 3267 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/networking/quic.rst
 create mode 100644 include/net/quic.h
 create mode 100644 include/uapi/linux/quic.h
 create mode 100644 net/ipv4/udp_ulp.c
 create mode 100644 net/quic/Kconfig
 create mode 100644 net/quic/Makefile
 create mode 100644 net/quic/quic_main.c
 create mode 100644 net/quic/quic_proc.c
 create mode 100644 tools/testing/selftests/net/quic.c
 create mode 100755 tools/testing/selftests/net/quic.sh


base-commit: fd78d07c7c35de260eb89f1be4a1e7487b8092ad
-- 
2.30.2


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [net-next v2 1/6] Documentation on QUIC kernel Tx crypto.
  2022-08-17 20:09 ` [net-next v2 " Adel Abouchaev
@ 2022-08-17 20:09   ` Adel Abouchaev
  2022-08-18  2:53     ` Bagas Sanjaya
  2022-08-17 20:09   ` [net-next v2 2/6] Define QUIC specific constants, control and data plane structures Adel Abouchaev
                     ` (6 subsequent siblings)
  7 siblings, 1 reply; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-17 20:09 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest, kernel test robot

Add documentation for kernel QUIC code.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>

---

Added quic.rst reference to the index.rst file; identation in
quic.rst file.
Reported-by: kernel test robot <lkp@intel.com>

Added SPDX license GPL 2.0.
v2: Removed whitespace at EOF.
---
 Documentation/networking/index.rst |   1 +
 Documentation/networking/quic.rst  | 185 +++++++++++++++++++++++++++++
 2 files changed, 186 insertions(+)
 create mode 100644 Documentation/networking/quic.rst

diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index 03b215bddde8..656fa1dac26b 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -90,6 +90,7 @@ Contents:
    plip
    ppp_generic
    proc_net_tcp
+   quic
    radiotap-headers
    rds
    regulatory
diff --git a/Documentation/networking/quic.rst b/Documentation/networking/quic.rst
new file mode 100644
index 000000000000..ed506b4d6bdd
--- /dev/null
+++ b/Documentation/networking/quic.rst
@@ -0,0 +1,185 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========
+KERNEL QUIC
+===========
+
+Overview
+========
+
+QUIC is a secure general-purpose transport protocol that creates a stateful
+interaction between a client and a server. QUIC provides end-to-end integrity
+and confidentiality. Refer to RFC 9000 for more information on QUIC.
+
+The kernel Tx side offload covers the encryption of the application streams
+in the kernel rather than in the application. These packets are 1RTT packets
+in QUIC connection. Encryption of every other packets is still done by the
+QUIC library in user space.
+
+
+
+User Interface
+==============
+
+Creating a QUIC connection
+--------------------------
+
+QUIC connection originates and terminates in the application, using one of many
+available QUIC libraries. The code instantiates QUIC client and QUIC server in
+some form and configures them to use certain addresses and ports for the
+source and destination. The client and server negotiate the set of keys to
+protect the communication during different phases of the connection, maintain
+the connection and perform congestion control.
+
+Requesting to add QUIC Tx kernel encryption to the connection
+-------------------------------------------------------------
+
+Each flow that should be encrypted by the kernel needs to be registered with
+the kernel using socket API. A setsockopt() call on the socket creates an
+association between the QUIC connection ID of the flow with the encryption
+parameters for the crypto operations:
+
+.. code-block:: c
+
+	struct quic_connection_info conn_info;
+	char conn_id[5] = {0x01, 0x02, 0x03, 0x04, 0x05};
+	const size_t conn_id_len = sizeof(conn_id);
+	char conn_key[16] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+			     0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f};
+	char conn_iv[12] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+			    0x08, 0x09, 0x0a, 0x0b};
+	char conn_hdr_key[16] = {0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
+				 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f
+				};
+
+	conn_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+
+	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_info.key.conn_id_length = 5;
+	memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE
+				      - conn_id_len],
+	       &conn_id, conn_id_len);
+
+	memcpy(&conn_info.payload_key, conn_key, sizeof(conn_key));
+	memcpy(&conn_info.payload_iv, conn_iv, sizeof(conn_iv));
+	memcpy(&conn_info.header_key, conn_hdr_key, sizeof(conn_hdr_key));
+
+	setsockopt(fd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION, &conn_info,
+		   sizeof(conn_info));
+
+
+Requesting to remove QUIC Tx kernel crypto offload control messages
+-------------------------------------------------------------------
+
+All flows are removed when the socket is closed. To request an explicit remove
+of the offload for the connection during the lifetime of the socket the process
+is similar to adding the flow. Only the connection ID and its length are
+necessary to supply to remove the connection from the offload:
+
+.. code-block:: c
+
+	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_info.key.conn_id_length = 5;
+	memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE
+				      - conn_id_len],
+	       &conn_id, conn_id_len);
+	setsockopt(fd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION, &conn_info,
+		   sizeof(conn_info));
+
+Sending QUIC application data
+-----------------------------
+
+For QUIC Tx encryption offload, the application should use sendmsg() socket
+call and provide ancillary data with information on connection ID length and
+offload flags for the kernel to perform the encryption and GSO support if
+requested.
+
+.. code-block:: c
+
+	size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
+	uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)];
+	struct quic_tx_ancillary_data * anc_data;
+	size_t quic_data_len = 4500;
+	struct cmsghdr * cmsg_hdr;
+	char quic_data[9000];
+	struct iovec iov[2];
+	int send_len = 9000;
+	struct msghdr msg;
+	int err;
+
+	iov[0].iov_base = quic_data;
+	iov[0].iov_len = quic_data_len;
+	iov[1].iov_base = quic_data + 4500;
+	iov[1].iov_len = quic_data_len;
+
+	if (client.addr.sin_family == AF_INET) {
+		msg.msg_name = &client.addr;
+		msg.msg_namelen = sizeof(client.addr);
+	} else {
+		msg.msg_name = &client.addr6;
+		msg.msg_namelen = sizeof(client.addr6);
+	}
+
+	msg.msg_iov = iov;
+	msg.msg_iovlen = 2;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
+	anc_data = CMSG_DATA(cmsg_hdr);
+	anc_data->flags = 0;
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->conn_id_length = conn_id_len;
+	err = sendmsg(self->sfd, &msg, 0);
+
+QUIC Tx offload in kernel will read the data from userspace, encrypt and
+copy it to the ciphertext within the same operation.
+
+
+Sending QUIC application data with GSO
+--------------------------------------
+When GSO is in use, the kernel will use the GSO fragment size as the target
+for ciphertext. The packets from the user space should align on the boundary
+of GSO fragment size minus the size of the tag for the chosen cipher. For the
+GSO fragment 1200, the plain packets should follow each other at every 1184
+bytes, given the tag size of 16. After the encryption, the rest of the UDP
+and IP stacks will follow the defined value of GSO fragment which will include
+the trailing tag bytes.
+
+To set up GSO fragmentation:
+
+.. code-block:: c
+
+	setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
+		   sizeof(frag_size));
+
+If the GSO fragment size is provided in ancillary data within the sendmsg()
+call, the value in ancillary data will take precedence over the segment size
+provided in setsockopt to split the payload into packets. This is consistent
+with the UDP stack behavior.
+
+Integrating to userspace QUIC libraries
+---------------------------------------
+
+Userspace QUIC libraries integration would depend on the implementation of the
+QUIC protocol. For MVFST library, the control plane is integrated into the
+handshake callbacks to properly configure the flows into the socket; and the
+data plane is integrated into the methods that perform encryption and send
+the packets to the batch scheduler for transmissions to the socket.
+
+MVFST library can be found at https://github.com/facebookincubator/mvfst.
+
+Statistics
+==========
+
+QUIC Tx offload to the kernel has counters
+(``/proc/net/quic_stat``):
+
+- ``QuicCurrTxSw`` -
+  number of currently active kernel offloaded QUIC connections
+- ``QuicTxSw`` -
+  accumulative total number of offloaded QUIC connections
+- ``QuicTxSwError`` -
+  accumulative total number of errors during QUIC Tx offload to kernel
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [net-next v2 2/6] Define QUIC specific constants, control and data plane structures
  2022-08-17 20:09 ` [net-next v2 " Adel Abouchaev
  2022-08-17 20:09   ` [net-next v2 1/6] Documentation on QUIC kernel Tx crypto Adel Abouchaev
@ 2022-08-17 20:09   ` Adel Abouchaev
  2022-08-17 20:09   ` [net-next v2 3/6] Add UDP ULP operations, initialization and handling prototype functions Adel Abouchaev
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-17 20:09 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

Define control and data plane structures to pass in control plane for
flow add/remove and during packet send within ancillary data. Define
constants to use within SOL_UDP to program QUIC sockets.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>
---
 include/uapi/linux/quic.h | 60 +++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/udp.h  |  3 ++
 2 files changed, 63 insertions(+)
 create mode 100644 include/uapi/linux/quic.h

diff --git a/include/uapi/linux/quic.h b/include/uapi/linux/quic.h
new file mode 100644
index 000000000000..38c54ff62d37
--- /dev/null
+++ b/include/uapi/linux/quic.h
@@ -0,0 +1,60 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) */
+
+#ifndef _UAPI_LINUX_QUIC_H
+#define _UAPI_LINUX_QUIC_H
+
+#include <linux/types.h>
+#include <linux/tls.h>
+
+#define QUIC_MAX_CONNECTION_ID_SIZE 20
+
+/* Side by side data for QUIC egress operations */
+#define QUIC_BYPASS_ENCRYPTION 0x01
+
+struct quic_tx_ancillary_data {
+	__aligned_u64	next_pkt_num;
+	__u8	flags;
+	__u8	conn_id_length;
+};
+
+struct quic_connection_info_key {
+	__u8	conn_id[QUIC_MAX_CONNECTION_ID_SIZE];
+	__u8	conn_id_length;
+};
+
+struct quic_aes_gcm_128 {
+	__u8	header_key[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
+	__u8	payload_key[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
+	__u8	payload_iv[TLS_CIPHER_AES_GCM_128_IV_SIZE];
+};
+
+struct quic_aes_gcm_256 {
+	__u8	header_key[TLS_CIPHER_AES_GCM_256_KEY_SIZE];
+	__u8	payload_key[TLS_CIPHER_AES_GCM_256_KEY_SIZE];
+	__u8	payload_iv[TLS_CIPHER_AES_GCM_256_IV_SIZE];
+};
+
+struct quic_aes_ccm_128 {
+	__u8	header_key[TLS_CIPHER_AES_CCM_128_KEY_SIZE];
+	__u8	payload_key[TLS_CIPHER_AES_CCM_128_KEY_SIZE];
+	__u8	payload_iv[TLS_CIPHER_AES_CCM_128_IV_SIZE];
+};
+
+struct quic_chacha20_poly1305 {
+	__u8	header_key[TLS_CIPHER_CHACHA20_POLY1305_KEY_SIZE];
+	__u8	payload_key[TLS_CIPHER_CHACHA20_POLY1305_KEY_SIZE];
+	__u8	payload_iv[TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE];
+};
+
+struct quic_connection_info {
+	__u16	cipher_type;
+	struct quic_connection_info_key		key;
+	union {
+		struct quic_aes_gcm_128 aes_gcm_128;
+		struct quic_aes_gcm_256 aes_gcm_256;
+		struct quic_aes_ccm_128 aes_ccm_128;
+		struct quic_chacha20_poly1305 chacha20_poly1305;
+	};
+};
+
+#endif
diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
index 4828794efcf8..0ee4c598e70b 100644
--- a/include/uapi/linux/udp.h
+++ b/include/uapi/linux/udp.h
@@ -34,6 +34,9 @@ struct udphdr {
 #define UDP_NO_CHECK6_RX 102	/* Disable accpeting checksum for UDP6 */
 #define UDP_SEGMENT	103	/* Set GSO segmentation size */
 #define UDP_GRO		104	/* This socket can receive UDP GRO packets */
+#define UDP_QUIC_ADD_TX_CONNECTION	106 /* Add QUIC Tx crypto offload */
+#define UDP_QUIC_DEL_TX_CONNECTION	107 /* Del QUIC Tx crypto offload */
+#define UDP_QUIC_ENCRYPT		108 /* QUIC encryption parameters */
 
 /* UDP encapsulation types */
 #define UDP_ENCAP_ESPINUDP_NON_IKE	1 /* draft-ietf-ipsec-nat-t-ike-00/01 */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [net-next v2 3/6] Add UDP ULP operations, initialization and handling prototype functions.
  2022-08-17 20:09 ` [net-next v2 " Adel Abouchaev
  2022-08-17 20:09   ` [net-next v2 1/6] Documentation on QUIC kernel Tx crypto Adel Abouchaev
  2022-08-17 20:09   ` [net-next v2 2/6] Define QUIC specific constants, control and data plane structures Adel Abouchaev
@ 2022-08-17 20:09   ` Adel Abouchaev
  2022-08-17 20:09   ` [net-next v2 4/6] Implement QUIC offload functions Adel Abouchaev
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-17 20:09 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

Define functions to add UDP ULP handling, registration with UDP protocol
and supporting data structures. Create structure for QUIC ULP and add empty
prototype functions to support it.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>

---

Removed reference to net/quic/Kconfig from this patch into the next.

Fixed formatting around brackets.
---
 include/net/inet_sock.h  |   2 +
 include/net/udp.h        |  33 +++++++
 include/uapi/linux/udp.h |   1 +
 net/Makefile             |   1 +
 net/ipv4/Makefile        |   3 +-
 net/ipv4/udp.c           |   6 ++
 net/ipv4/udp_ulp.c       | 192 +++++++++++++++++++++++++++++++++++++++
 7 files changed, 237 insertions(+), 1 deletion(-)
 create mode 100644 net/ipv4/udp_ulp.c

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index bf5654ce711e..650e332bdb50 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -249,6 +249,8 @@ struct inet_sock {
 	__be32			mc_addr;
 	struct ip_mc_socklist __rcu	*mc_list;
 	struct inet_cork_full	cork;
+	const struct udp_ulp_ops	*udp_ulp_ops;
+	void __rcu		*ulp_data;
 };
 
 #define IPCORK_OPT	1	/* ip-options has been held in ipcork.opt */
diff --git a/include/net/udp.h b/include/net/udp.h
index 5ee88ddf79c3..f22ebabbb186 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -523,4 +523,37 @@ struct proto *udp_bpf_get_proto(struct sock *sk, struct sk_psock *psock);
 int udp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore);
 #endif
 
+/*
+ * Interface for adding Upper Level Protocols over UDP
+ */
+
+#define UDP_ULP_NAME_MAX	16
+#define UDP_ULP_MAX		128
+
+struct udp_ulp_ops {
+	struct list_head	list;
+
+	/* initialize ulp */
+	int (*init)(struct sock *sk);
+	/* cleanup ulp */
+	void (*release)(struct sock *sk);
+
+	char		name[UDP_ULP_NAME_MAX];
+	struct module	*owner;
+};
+
+int udp_register_ulp(struct udp_ulp_ops *type);
+void udp_unregister_ulp(struct udp_ulp_ops *type);
+int udp_set_ulp(struct sock *sk, const char *name);
+void udp_get_available_ulp(char *buf, size_t len);
+void udp_cleanup_ulp(struct sock *sk);
+int udp_setsockopt_ulp(struct sock *sk, sockptr_t optval,
+		       unsigned int optlen);
+int udp_getsockopt_ulp(struct sock *sk, char __user *optval,
+		       int __user *optlen);
+
+#define MODULE_ALIAS_UDP_ULP(name)\
+	__MODULE_INFO(alias, alias_userspace, name);\
+	__MODULE_INFO(alias, alias_udp_ulp, "udp-ulp-" name)
+
 #endif	/* _UDP_H */
diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
index 0ee4c598e70b..893691f0108a 100644
--- a/include/uapi/linux/udp.h
+++ b/include/uapi/linux/udp.h
@@ -34,6 +34,7 @@ struct udphdr {
 #define UDP_NO_CHECK6_RX 102	/* Disable accpeting checksum for UDP6 */
 #define UDP_SEGMENT	103	/* Set GSO segmentation size */
 #define UDP_GRO		104	/* This socket can receive UDP GRO packets */
+#define UDP_ULP		105	/* Attach ULP to a UDP socket */
 #define UDP_QUIC_ADD_TX_CONNECTION	106 /* Add QUIC Tx crypto offload */
 #define UDP_QUIC_DEL_TX_CONNECTION	107 /* Del QUIC Tx crypto offload */
 #define UDP_QUIC_ENCRYPT		108 /* QUIC encryption parameters */
diff --git a/net/Makefile b/net/Makefile
index fbfeb8a0bb37..28565bfe29cb 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -16,6 +16,7 @@ obj-y				+= ethernet/ 802/ sched/ netlink/ bpf/ ethtool/
 obj-$(CONFIG_NETFILTER)		+= netfilter/
 obj-$(CONFIG_INET)		+= ipv4/
 obj-$(CONFIG_TLS)		+= tls/
+obj-$(CONFIG_QUIC)		+= quic/
 obj-$(CONFIG_XFRM)		+= xfrm/
 obj-$(CONFIG_UNIX_SCM)		+= unix/
 obj-y				+= ipv6/
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index bbdd9c44f14e..88d3baf4af95 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -14,7 +14,8 @@ obj-y     := route.o inetpeer.o protocol.o \
 	     udp_offload.o arp.o icmp.o devinet.o af_inet.o igmp.o \
 	     fib_frontend.o fib_semantics.o fib_trie.o fib_notifier.o \
 	     inet_fragment.o ping.o ip_tunnel_core.o gre_offload.o \
-	     metrics.o netlink.o nexthop.o udp_tunnel_stub.o
+	     metrics.o netlink.o nexthop.o udp_tunnel_stub.o \
+	     udp_ulp.o
 
 obj-$(CONFIG_BPFILTER) += bpfilter/
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 34eda973bbf1..027c4513a9cd 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2779,6 +2779,9 @@ int udp_lib_setsockopt(struct sock *sk, int level, int optname,
 		up->pcflag |= UDPLITE_RECV_CC;
 		break;
 
+	case UDP_ULP:
+		return udp_setsockopt_ulp(sk, optval, optlen);
+
 	default:
 		err = -ENOPROTOOPT;
 		break;
@@ -2847,6 +2850,9 @@ int udp_lib_getsockopt(struct sock *sk, int level, int optname,
 		val = up->pcrlen;
 		break;
 
+	case UDP_ULP:
+		return udp_getsockopt_ulp(sk, optval, optlen);
+
 	default:
 		return -ENOPROTOOPT;
 	}
diff --git a/net/ipv4/udp_ulp.c b/net/ipv4/udp_ulp.c
new file mode 100644
index 000000000000..138818690151
--- /dev/null
+++ b/net/ipv4/udp_ulp.c
@@ -0,0 +1,192 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Pluggable UDP upper layer protocol support, based on pluggable TCP upper
+ * layer protocol support.
+ *
+ * Copyright (c) 2016-2017, Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2016-2017, Dave Watson <davejwatson@fb.com>. All rights
+ * reserved.
+ * Copyright (c) 2021-2022, Meta Platforms, Inc. All rights reserved.
+ */
+
+#include <linux/gfp.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/mm.h>
+#include <linux/types.h>
+#include <linux/skmsg.h>
+#include <net/tcp.h>
+#include <net/udp.h>
+
+static DEFINE_SPINLOCK(udp_ulp_list_lock);
+static LIST_HEAD(udp_ulp_list);
+
+/* Simple linear search, don't expect many entries! */
+static struct udp_ulp_ops *udp_ulp_find(const char *name)
+{
+	struct udp_ulp_ops *e;
+
+	list_for_each_entry_rcu(e, &udp_ulp_list, list,
+				lockdep_is_held(&udp_ulp_list_lock)) {
+		if (strcmp(e->name, name) == 0)
+			return e;
+	}
+
+	return NULL;
+}
+
+static const struct udp_ulp_ops *__udp_ulp_find_autoload(const char *name)
+{
+	const struct udp_ulp_ops *ulp = NULL;
+
+	rcu_read_lock();
+	ulp = udp_ulp_find(name);
+
+#ifdef CONFIG_MODULES
+	if (!ulp && capable(CAP_NET_ADMIN)) {
+		rcu_read_unlock();
+		request_module("udp-ulp-%s", name);
+		rcu_read_lock();
+		ulp = udp_ulp_find(name);
+	}
+#endif
+	if (!ulp || !try_module_get(ulp->owner))
+		ulp = NULL;
+
+	rcu_read_unlock();
+	return ulp;
+}
+
+/* Attach new upper layer protocol to the list
+ * of available protocols.
+ */
+int udp_register_ulp(struct udp_ulp_ops *ulp)
+{
+	int ret = 0;
+
+	spin_lock(&udp_ulp_list_lock);
+	if (udp_ulp_find(ulp->name))
+		ret = -EEXIST;
+	else
+		list_add_tail_rcu(&ulp->list, &udp_ulp_list);
+
+	spin_unlock(&udp_ulp_list_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(udp_register_ulp);
+
+void udp_unregister_ulp(struct udp_ulp_ops *ulp)
+{
+	spin_lock(&udp_ulp_list_lock);
+	list_del_rcu(&ulp->list);
+	spin_unlock(&udp_ulp_list_lock);
+
+	synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(udp_unregister_ulp);
+
+void udp_cleanup_ulp(struct sock *sk)
+{
+	struct inet_sock *inet = inet_sk(sk);
+
+	/* No sock_owned_by_me() check here as at the time the
+	 * stack calls this function, the socket is dead and
+	 * about to be destroyed.
+	 */
+	if (!inet->udp_ulp_ops)
+		return;
+
+	if (inet->udp_ulp_ops->release)
+		inet->udp_ulp_ops->release(sk);
+	module_put(inet->udp_ulp_ops->owner);
+
+	inet->udp_ulp_ops = NULL;
+}
+
+static int __udp_set_ulp(struct sock *sk, const struct udp_ulp_ops *ulp_ops)
+{
+	struct inet_sock *inet = inet_sk(sk);
+	int err;
+
+	err = -EEXIST;
+	if (inet->udp_ulp_ops)
+		goto out_err;
+
+	err = ulp_ops->init(sk);
+	if (err)
+		goto out_err;
+
+	inet->udp_ulp_ops = ulp_ops;
+	return 0;
+
+out_err:
+	module_put(ulp_ops->owner);
+	return err;
+}
+
+int udp_set_ulp(struct sock *sk, const char *name)
+{
+	struct sk_psock *psock = sk_psock_get(sk);
+	const struct udp_ulp_ops *ulp_ops;
+
+	if (psock) {
+		sk_psock_put(sk, psock);
+		return -EINVAL;
+	}
+
+	sock_owned_by_me(sk);
+	ulp_ops = __udp_ulp_find_autoload(name);
+	if (!ulp_ops)
+		return -ENOENT;
+
+	return __udp_set_ulp(sk, ulp_ops);
+}
+
+int udp_setsockopt_ulp(struct sock *sk, sockptr_t optval, unsigned int optlen)
+{
+	char name[UDP_ULP_NAME_MAX];
+	int val, err;
+
+	if (!optlen || optlen > UDP_ULP_NAME_MAX)
+		return -EINVAL;
+
+	val = strncpy_from_sockptr(name, optval, optlen);
+	if (val < 0)
+		return -EFAULT;
+
+	if (val == UDP_ULP_NAME_MAX)
+		return -EINVAL;
+
+	name[val] = 0;
+	lock_sock(sk);
+	err = udp_set_ulp(sk, name);
+	release_sock(sk);
+	return err;
+}
+
+int udp_getsockopt_ulp(struct sock *sk, char __user *optval, int __user *optlen)
+{
+	struct inet_sock *inet = inet_sk(sk);
+	int len;
+
+	if (get_user(len, optlen))
+		return -EFAULT;
+
+	len = min_t(unsigned int, len, UDP_ULP_NAME_MAX);
+	if (len < 0)
+		return -EINVAL;
+
+	if (!inet->udp_ulp_ops) {
+		if (put_user(0, optlen))
+			return -EFAULT;
+		return 0;
+	}
+
+	if (put_user(len, optlen))
+		return -EFAULT;
+	if (copy_to_user(optval, inet->udp_ulp_ops->name, len))
+		return -EFAULT;
+
+	return 0;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [net-next v2 4/6] Implement QUIC offload functions
  2022-08-17 20:09 ` [net-next v2 " Adel Abouchaev
                     ` (2 preceding siblings ...)
  2022-08-17 20:09   ` [net-next v2 3/6] Add UDP ULP operations, initialization and handling prototype functions Adel Abouchaev
@ 2022-08-17 20:09   ` Adel Abouchaev
  2022-08-17 20:09   ` [net-next v2 5/6] Add flow counters and Tx processing error counter Adel Abouchaev
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-17 20:09 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

Add connection hash to the context do support add, remove operations
on QUIC connections for the control plane and lookup for the data
plane. Implement setsockopt and add placeholders to add and delete Tx
connections.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>

---

Added net/quic/Kconfig reference to net/Kconfig in this commit.

Initialized pointers with NULL vs 0. Restricted AES counter to __le32
Added address space qualifiers to user space addresses. Removed empty
lines. Updated code alignment. Removed inlines.

v3: removed ITER_KVEC flag from iov_iter_kvec call.
v3: fixed Chacha20 encryption bug.
---
 include/net/quic.h   |   53 ++
 net/Kconfig          |    1 +
 net/ipv4/udp.c       |    9 +
 net/quic/Kconfig     |   16 +
 net/quic/Makefile    |    8 +
 net/quic/quic_main.c | 1371 ++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 1458 insertions(+)
 create mode 100644 include/net/quic.h
 create mode 100644 net/quic/Kconfig
 create mode 100644 net/quic/Makefile
 create mode 100644 net/quic/quic_main.c

diff --git a/include/net/quic.h b/include/net/quic.h
new file mode 100644
index 000000000000..cafe01174e60
--- /dev/null
+++ b/include/net/quic.h
@@ -0,0 +1,53 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifndef INCLUDE_NET_QUIC_H
+#define INCLUDE_NET_QUIC_H
+
+#include <linux/mutex.h>
+#include <linux/rhashtable.h>
+#include <linux/skmsg.h>
+#include <uapi/linux/quic.h>
+
+#define QUIC_MAX_SHORT_HEADER_SIZE      25
+#define QUIC_MAX_CONNECTION_ID_SIZE     20
+#define QUIC_HDR_MASK_SIZE              16
+#define QUIC_MAX_GSO_FRAGS              16
+
+// Maximum IV and nonce sizes should be in sync with supported ciphers.
+#define QUIC_CIPHER_MAX_IV_SIZE		12
+#define QUIC_CIPHER_MAX_NONCE_SIZE	16
+
+/* Side by side data for QUIC egress operations */
+#define QUIC_ANCILLARY_FLAGS    (QUIC_BYPASS_ENCRYPTION)
+
+#define QUIC_MAX_IOVEC_SEGMENTS		8
+#define QUIC_MAX_SG_ALLOC_ELEMENTS	32
+#define QUIC_MAX_PLAIN_PAGES		16
+#define QUIC_MAX_CIPHER_PAGES_ORDER	4
+
+struct quic_internal_crypto_context {
+	struct quic_connection_info	conn_info;
+	struct crypto_skcipher		*header_tfm;
+	struct crypto_aead		*packet_aead;
+};
+
+struct quic_connection_rhash {
+	struct rhash_head			node;
+	struct quic_internal_crypto_context	crypto_ctx;
+	struct rcu_head				rcu;
+};
+
+struct quic_context {
+	struct proto		*sk_proto;
+	struct rhashtable	tx_connections;
+	struct scatterlist	sg_alloc[QUIC_MAX_SG_ALLOC_ELEMENTS];
+	struct page		*cipher_page;
+	/**
+	 * To synchronize concurrent sendmsg() requests through the same socket
+	 * and protect preallocated per-context memory.
+	 **/
+	struct mutex		sendmsg_mux;
+	struct rcu_head		rcu;
+};
+
+#endif
diff --git a/net/Kconfig b/net/Kconfig
index 6b78f695caa6..93e3b1308aec 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -63,6 +63,7 @@ menu "Networking options"
 source "net/packet/Kconfig"
 source "net/unix/Kconfig"
 source "net/tls/Kconfig"
+source "net/quic/Kconfig"
 source "net/xfrm/Kconfig"
 source "net/iucv/Kconfig"
 source "net/smc/Kconfig"
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 027c4513a9cd..e7cbbea9d8d9 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -113,6 +113,7 @@
 #include <net/sock_reuseport.h>
 #include <net/addrconf.h>
 #include <net/udp_tunnel.h>
+#include <uapi/linux/quic.h>
 #if IS_ENABLED(CONFIG_IPV6)
 #include <net/ipv6_stubs.h>
 #endif
@@ -1011,6 +1012,14 @@ static int __udp_cmsg_send(struct cmsghdr *cmsg, u16 *gso_size)
 			return -EINVAL;
 		*gso_size = *(__u16 *)CMSG_DATA(cmsg);
 		return 0;
+	case UDP_QUIC_ENCRYPT:
+		/* This option is handled in UDP_ULP and is only checked
+		 * here for the bypass bit
+		 */
+		if (cmsg->cmsg_len !=
+		    CMSG_LEN(sizeof(struct quic_tx_ancillary_data)))
+			return -EINVAL;
+		return 0;
 	default:
 		return -EINVAL;
 	}
diff --git a/net/quic/Kconfig b/net/quic/Kconfig
new file mode 100644
index 000000000000..661cb989508a
--- /dev/null
+++ b/net/quic/Kconfig
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# QUIC configuration
+#
+config QUIC
+	tristate "QUIC encryption offload"
+	depends on INET
+	select CRYPTO
+	select CRYPTO_AES
+	select CRYPTO_GCM
+	help
+	Enable kernel support for QUIC crypto offload. Currently only TX
+	encryption offload is supported. The kernel will perform
+	copy-during-encryption.
+
+	If unsure, say N.
diff --git a/net/quic/Makefile b/net/quic/Makefile
new file mode 100644
index 000000000000..928239c4d08c
--- /dev/null
+++ b/net/quic/Makefile
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for the QUIC subsystem
+#
+
+obj-$(CONFIG_QUIC) += quic.o
+
+quic-y := quic_main.o
diff --git a/net/quic/quic_main.c b/net/quic/quic_main.c
new file mode 100644
index 000000000000..95de3a961479
--- /dev/null
+++ b/net/quic/quic_main.c
@@ -0,0 +1,1371 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <crypto/skcipher.h>
+#include <linux/bug.h>
+#include <linux/module.h>
+#include <linux/rhashtable.h>
+// Include header to use TLS constants for AEAD cipher.
+#include <net/tls.h>
+#include <net/quic.h>
+#include <net/udp.h>
+#include <uapi/linux/quic.h>
+
+static unsigned long af_init_done;
+static struct proto quic_v4_proto;
+static struct proto quic_v6_proto;
+static DEFINE_SPINLOCK(quic_proto_lock);
+
+static u32 quic_tx_connection_hash(const void *data, u32 len, u32 seed)
+{
+	return jhash(data, len, seed);
+}
+
+static u32 quic_tx_connection_hash_obj(const void *data, u32 len, u32 seed)
+{
+	const struct quic_connection_rhash *connhash = data;
+
+	return jhash(&connhash->crypto_ctx.conn_info.key,
+		     sizeof(struct quic_connection_info_key), seed);
+}
+
+static int quic_tx_connection_hash_cmp(struct rhashtable_compare_arg *arg,
+				       const void *ptr)
+{
+	const struct quic_connection_info_key *key = arg->key;
+	const struct quic_connection_rhash *x = ptr;
+
+	return !!memcmp(&x->crypto_ctx.conn_info.key,
+			key,
+			sizeof(struct quic_connection_info_key));
+}
+
+static const struct rhashtable_params quic_tx_connection_params = {
+	.key_len		= sizeof(struct quic_connection_info_key),
+	.head_offset		= offsetof(struct quic_connection_rhash, node),
+	.hashfn			= quic_tx_connection_hash,
+	.obj_hashfn		= quic_tx_connection_hash_obj,
+	.obj_cmpfn		= quic_tx_connection_hash_cmp,
+	.automatic_shrinking	= true,
+};
+
+static size_t quic_crypto_key_size(u16 cipher_type)
+{
+	switch (cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		return TLS_CIPHER_AES_GCM_128_KEY_SIZE;
+	case TLS_CIPHER_AES_GCM_256:
+		return TLS_CIPHER_AES_GCM_256_KEY_SIZE;
+	case TLS_CIPHER_AES_CCM_128:
+		return TLS_CIPHER_AES_CCM_128_KEY_SIZE;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		return TLS_CIPHER_CHACHA20_POLY1305_KEY_SIZE;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return 0;
+}
+
+static size_t quic_crypto_tag_size(u16 cipher_type)
+{
+	switch (cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		return TLS_CIPHER_AES_GCM_128_TAG_SIZE;
+	case TLS_CIPHER_AES_GCM_256:
+		return TLS_CIPHER_AES_GCM_256_TAG_SIZE;
+	case TLS_CIPHER_AES_CCM_128:
+		return TLS_CIPHER_AES_CCM_128_TAG_SIZE;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		return TLS_CIPHER_CHACHA20_POLY1305_TAG_SIZE;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return 0;
+}
+
+static size_t quic_crypto_nonce_size(u16 cipher_type)
+{
+	switch (cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		BUILD_BUG_ON(TLS_CIPHER_AES_GCM_128_IV_SIZE +
+			     TLS_CIPHER_AES_GCM_128_SALT_SIZE >
+			     QUIC_CIPHER_MAX_NONCE_SIZE);
+		return TLS_CIPHER_AES_GCM_128_IV_SIZE +
+		       TLS_CIPHER_AES_GCM_128_SALT_SIZE;
+	case TLS_CIPHER_AES_GCM_256:
+		BUILD_BUG_ON(TLS_CIPHER_AES_GCM_256_IV_SIZE +
+			     TLS_CIPHER_AES_GCM_256_SALT_SIZE >
+			     QUIC_CIPHER_MAX_NONCE_SIZE);
+		return TLS_CIPHER_AES_GCM_256_IV_SIZE +
+		       TLS_CIPHER_AES_GCM_256_SALT_SIZE;
+	case TLS_CIPHER_AES_CCM_128:
+		BUILD_BUG_ON(TLS_CIPHER_AES_CCM_128_IV_SIZE +
+			     TLS_CIPHER_AES_CCM_128_SALT_SIZE >
+			     QUIC_CIPHER_MAX_NONCE_SIZE);
+		return TLS_CIPHER_AES_CCM_128_IV_SIZE +
+		       TLS_CIPHER_AES_CCM_128_SALT_SIZE;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		BUILD_BUG_ON(TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE +
+			     TLS_CIPHER_CHACHA20_POLY1305_SALT_SIZE >
+			     QUIC_CIPHER_MAX_NONCE_SIZE);
+		return TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE +
+		       TLS_CIPHER_CHACHA20_POLY1305_SALT_SIZE;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return 0;
+}
+
+static u8 *quic_payload_iv(struct quic_internal_crypto_context *crypto_ctx)
+{
+	switch (crypto_ctx->conn_info.cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		return crypto_ctx->conn_info.aes_gcm_128.payload_iv;
+	case TLS_CIPHER_AES_GCM_256:
+		return crypto_ctx->conn_info.aes_gcm_256.payload_iv;
+	case TLS_CIPHER_AES_CCM_128:
+		return crypto_ctx->conn_info.aes_ccm_128.payload_iv;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		return crypto_ctx->conn_info.chacha20_poly1305.payload_iv;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return NULL;
+}
+
+static int
+quic_config_header_crypto(struct quic_internal_crypto_context *crypto_ctx)
+{
+	struct crypto_skcipher *tfm;
+	char *header_cipher;
+	int rc = 0;
+	char *key;
+
+	switch (crypto_ctx->conn_info.cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		header_cipher = "ecb(aes)";
+		key = crypto_ctx->conn_info.aes_gcm_128.header_key;
+		break;
+	case TLS_CIPHER_AES_GCM_256:
+		header_cipher = "ecb(aes)";
+		key = crypto_ctx->conn_info.aes_gcm_256.header_key;
+		break;
+	case TLS_CIPHER_AES_CCM_128:
+		header_cipher = "ecb(aes)";
+		key = crypto_ctx->conn_info.aes_ccm_128.header_key;
+		break;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		header_cipher = "chacha20";
+		key = crypto_ctx->conn_info.chacha20_poly1305.header_key;
+		break;
+	default:
+		rc = -EINVAL;
+		goto out;
+	}
+
+	tfm = crypto_alloc_skcipher(header_cipher, 0, 0);
+	if (IS_ERR(tfm)) {
+		rc = PTR_ERR(tfm);
+		goto out;
+	}
+
+	rc = crypto_skcipher_setkey(tfm, key,
+				    quic_crypto_key_size(crypto_ctx->conn_info
+							 .cipher_type));
+	if (rc) {
+		crypto_free_skcipher(tfm);
+		goto out;
+	}
+
+	crypto_ctx->header_tfm = tfm;
+
+out:
+	return rc;
+}
+
+static int
+quic_config_packet_crypto(struct quic_internal_crypto_context *crypto_ctx)
+{
+	struct crypto_aead *aead;
+	char *cipher_name;
+	int rc = 0;
+	char *key;
+
+	switch (crypto_ctx->conn_info.cipher_type) {
+	case TLS_CIPHER_AES_GCM_128: {
+		key = crypto_ctx->conn_info.aes_gcm_128.payload_key;
+		cipher_name = "gcm(aes)";
+		break;
+	}
+	case TLS_CIPHER_AES_GCM_256: {
+		key = crypto_ctx->conn_info.aes_gcm_256.payload_key;
+		cipher_name = "gcm(aes)";
+		break;
+	}
+	case TLS_CIPHER_AES_CCM_128: {
+		key = crypto_ctx->conn_info.aes_ccm_128.payload_key;
+		cipher_name = "ccm(aes)";
+		break;
+	}
+	case TLS_CIPHER_CHACHA20_POLY1305: {
+		key = crypto_ctx->conn_info.chacha20_poly1305.payload_key;
+		cipher_name = "rfc7539(chacha20,poly1305)";
+		break;
+	}
+	default:
+		rc = -EINVAL;
+		goto out;
+	}
+
+	aead = crypto_alloc_aead(cipher_name, 0, 0);
+	if (IS_ERR(aead)) {
+		rc = PTR_ERR(aead);
+		goto out;
+	}
+
+	rc = crypto_aead_setkey(aead, key,
+				quic_crypto_key_size(crypto_ctx->conn_info
+						     .cipher_type));
+	if (rc)
+		goto free_aead;
+
+	rc = crypto_aead_setauthsize(aead,
+				     quic_crypto_tag_size(crypto_ctx->conn_info
+							  .cipher_type));
+	if (rc)
+		goto free_aead;
+
+	crypto_ctx->packet_aead = aead;
+	goto out;
+
+free_aead:
+	crypto_free_aead(aead);
+
+out:
+	return rc;
+}
+
+static inline struct quic_context *quic_get_ctx(struct sock *sk)
+{
+	struct inet_sock *inet = inet_sk(sk);
+
+	return (__force void *)rcu_access_pointer(inet->ulp_data);
+}
+
+static void quic_free_cipher_page(struct page *page)
+{
+	__free_pages(page, QUIC_MAX_CIPHER_PAGES_ORDER);
+}
+
+static struct quic_context *quic_ctx_create(void)
+{
+	struct quic_context *ctx;
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return NULL;
+
+	mutex_init(&ctx->sendmsg_mux);
+	ctx->cipher_page = alloc_pages(GFP_KERNEL, QUIC_MAX_CIPHER_PAGES_ORDER);
+	if (!ctx->cipher_page)
+		goto out_err;
+
+	if (rhashtable_init(&ctx->tx_connections,
+			    &quic_tx_connection_params) < 0) {
+		quic_free_cipher_page(ctx->cipher_page);
+		goto out_err;
+	}
+
+	return ctx;
+
+out_err:
+	kfree(ctx);
+	return NULL;
+}
+
+static int quic_getsockopt(struct sock *sk, int level, int optname,
+			   char __user *optval, int __user *optlen)
+{
+	struct quic_context *ctx = quic_get_ctx(sk);
+
+	return ctx->sk_proto->getsockopt(sk, level, optname, optval, optlen);
+}
+
+static int do_quic_conn_add_tx(struct sock *sk, sockptr_t optval,
+			       unsigned int optlen)
+{
+	struct quic_internal_crypto_context *crypto_ctx;
+	struct quic_context *ctx = quic_get_ctx(sk);
+	struct quic_connection_rhash *connhash;
+	int rc = 0;
+
+	if (sockptr_is_null(optval))
+		return -EINVAL;
+
+	if (optlen != sizeof(struct quic_connection_info))
+		return -EINVAL;
+
+	connhash = kzalloc(sizeof(*connhash), GFP_KERNEL);
+	if (!connhash)
+		return -EFAULT;
+
+	crypto_ctx = &connhash->crypto_ctx;
+	rc = copy_from_sockptr(&crypto_ctx->conn_info, optval,
+			       sizeof(crypto_ctx->conn_info));
+	if (rc) {
+		rc = -EFAULT;
+		goto err_crypto_info;
+	}
+
+	// create all TLS materials for packet and header decryption
+	rc = quic_config_header_crypto(crypto_ctx);
+	if (rc)
+		goto err_crypto_info;
+
+	rc = quic_config_packet_crypto(crypto_ctx);
+	if (rc)
+		goto err_free_skcipher;
+
+	// insert crypto data into hash per connection ID
+	rc = rhashtable_insert_fast(&ctx->tx_connections, &connhash->node,
+				    quic_tx_connection_params);
+	if (rc < 0)
+		goto err_free_ciphers;
+
+	return 0;
+
+err_free_ciphers:
+	crypto_free_aead(crypto_ctx->packet_aead);
+
+err_free_skcipher:
+	crypto_free_skcipher(crypto_ctx->header_tfm);
+
+err_crypto_info:
+	// wipeout all crypto materials;
+	memzero_explicit(&connhash->crypto_ctx, sizeof(connhash->crypto_ctx));
+	kfree(connhash);
+	return rc;
+}
+
+static int do_quic_conn_del_tx(struct sock *sk, sockptr_t optval,
+			       unsigned int optlen)
+{
+	struct quic_internal_crypto_context *crypto_ctx;
+	struct quic_context *ctx = quic_get_ctx(sk);
+	struct quic_connection_rhash *connhash;
+	struct quic_connection_info conn_info;
+
+	if (sockptr_is_null(optval))
+		return -EINVAL;
+
+	if (optlen != sizeof(struct quic_connection_info))
+		return -EINVAL;
+
+	if (copy_from_sockptr(&conn_info, optval, optlen))
+		return -EFAULT;
+
+	connhash = rhashtable_lookup_fast(&ctx->tx_connections,
+					  &conn_info.key,
+					  quic_tx_connection_params);
+	if (!connhash)
+		return -EINVAL;
+
+	rhashtable_remove_fast(&ctx->tx_connections,
+			       &connhash->node,
+			       quic_tx_connection_params);
+
+	crypto_ctx = &connhash->crypto_ctx;
+
+	crypto_free_skcipher(crypto_ctx->header_tfm);
+	crypto_free_aead(crypto_ctx->packet_aead);
+	memzero_explicit(crypto_ctx, sizeof(*crypto_ctx));
+	kfree(connhash);
+
+	return 0;
+}
+
+static int do_quic_setsockopt(struct sock *sk, int optname, sockptr_t optval,
+			      unsigned int optlen)
+{
+	int rc = 0;
+
+	switch (optname) {
+	case UDP_QUIC_ADD_TX_CONNECTION:
+		lock_sock(sk);
+		rc = do_quic_conn_add_tx(sk, optval, optlen);
+		release_sock(sk);
+		break;
+	case UDP_QUIC_DEL_TX_CONNECTION:
+		lock_sock(sk);
+		rc = do_quic_conn_del_tx(sk, optval, optlen);
+		release_sock(sk);
+		break;
+	default:
+		rc = -ENOPROTOOPT;
+		break;
+	}
+
+	return rc;
+}
+
+static int quic_setsockopt(struct sock *sk, int level, int optname,
+			   sockptr_t optval, unsigned int optlen)
+{
+	struct quic_context *ctx;
+	struct proto *sk_proto;
+
+	rcu_read_lock();
+	ctx = quic_get_ctx(sk);
+	sk_proto = ctx->sk_proto;
+	rcu_read_unlock();
+
+	if (level == SOL_UDP &&
+	    (optname == UDP_QUIC_ADD_TX_CONNECTION ||
+	     optname == UDP_QUIC_DEL_TX_CONNECTION))
+		return do_quic_setsockopt(sk, optname, optval, optlen);
+
+	return sk_proto->setsockopt(sk, level, optname, optval, optlen);
+}
+
+static int
+quic_extract_ancillary_data(struct msghdr *msg,
+			    struct quic_tx_ancillary_data *ancillary_data,
+			    u16 *udp_pkt_size)
+{
+	struct cmsghdr *cmsg_hdr = NULL;
+	void *ancillary_data_ptr = NULL;
+
+	if (!msg->msg_controllen)
+		return -EINVAL;
+
+	for_each_cmsghdr(cmsg_hdr, msg) {
+		if (!CMSG_OK(msg, cmsg_hdr))
+			return -EINVAL;
+
+		if (cmsg_hdr->cmsg_level != IPPROTO_UDP)
+			continue;
+
+		if (cmsg_hdr->cmsg_type == UDP_QUIC_ENCRYPT) {
+			if (cmsg_hdr->cmsg_len !=
+			    CMSG_LEN(sizeof(struct quic_tx_ancillary_data)))
+				return -EINVAL;
+			memcpy((void *)ancillary_data, CMSG_DATA(cmsg_hdr),
+			       sizeof(struct quic_tx_ancillary_data));
+			ancillary_data_ptr = cmsg_hdr;
+		} else if (cmsg_hdr->cmsg_type == UDP_SEGMENT) {
+			if (cmsg_hdr->cmsg_len != CMSG_LEN(sizeof(u16)))
+				return -EINVAL;
+			memcpy((void *)udp_pkt_size, CMSG_DATA(cmsg_hdr),
+			       sizeof(u16));
+		}
+	}
+
+	if (!ancillary_data_ptr)
+		return -EINVAL;
+
+	return 0;
+}
+
+static int quic_sendmsg_validate(struct msghdr *msg)
+{
+	if (!iter_is_iovec(&msg->msg_iter))
+		return -EINVAL;
+
+	if (!msg->msg_controllen)
+		return -EINVAL;
+
+	return 0;
+}
+
+static struct quic_connection_rhash
+*quic_lookup_connection(struct quic_context *ctx,
+			u8 *conn_id,
+			struct quic_tx_ancillary_data *ancillary_data)
+{
+	struct quic_connection_info_key conn_key;
+
+	// Lookup connection information by the connection key.
+	memset(&conn_key, 0, sizeof(struct quic_connection_info_key));
+	// fill the connection id up to the max connection ID length
+	if (ancillary_data->conn_id_length > QUIC_MAX_CONNECTION_ID_SIZE)
+		return NULL;
+
+	conn_key.conn_id_length = ancillary_data->conn_id_length;
+	if (ancillary_data->conn_id_length)
+		memcpy(conn_key.conn_id,
+		       conn_id,
+		       ancillary_data->conn_id_length);
+	return rhashtable_lookup_fast(&ctx->tx_connections,
+				      &conn_key,
+				      quic_tx_connection_params);
+}
+
+static int quic_sg_capacity_from_msg(const size_t pkt_size,
+				     const off_t offset,
+				     const size_t length)
+{
+	size_t	pages = 0;
+	size_t	pkts = 0;
+
+	pages = DIV_ROUND_UP(offset + length, PAGE_SIZE);
+	pkts = DIV_ROUND_UP(length, pkt_size);
+	return pages + pkts + 1;
+}
+
+static void quic_put_plain_user_pages(struct page **pages, size_t nr_pages)
+{
+	int i;
+
+	for (i = 0; i < nr_pages; ++i)
+		if (i == 0 || pages[i] != pages[i - 1])
+			put_page(pages[i]);
+}
+
+static int quic_get_plain_user_pages(struct msghdr * const msg,
+				     struct page **pages,
+				     int *page_indices)
+{
+	size_t	nr_mapped = 0;
+	size_t	nr_pages = 0;
+	void __user	*data_addr;
+	void	*page_addr;
+	size_t	count = 0;
+	off_t	data_off;
+	int	ret = 0;
+	int	i;
+
+	for (i = 0; i < msg->msg_iter.nr_segs; ++i) {
+		data_addr = msg->msg_iter.iov[i].iov_base;
+		if (!i)
+			data_addr += msg->msg_iter.iov_offset;
+		page_addr =
+			(void *)((unsigned long)data_addr & PAGE_MASK);
+
+		data_off = (unsigned long)data_addr & ~PAGE_MASK;
+		nr_pages =
+			DIV_ROUND_UP(data_off + msg->msg_iter.iov[i].iov_len,
+				     PAGE_SIZE);
+		if (nr_mapped + nr_pages > QUIC_MAX_PLAIN_PAGES) {
+			quic_put_plain_user_pages(pages, nr_mapped);
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		count = get_user_pages((unsigned long)page_addr, nr_pages, 1,
+				       pages, NULL);
+		if (count < nr_pages) {
+			quic_put_plain_user_pages(pages, nr_mapped + count);
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		page_indices[i] = nr_mapped;
+		nr_mapped += count;
+		pages += count;
+	}
+	ret = nr_mapped;
+
+out:
+	return ret;
+}
+
+static int quic_sg_plain_from_mapped_msg(struct msghdr * const msg,
+					 struct page **plain_pages,
+					 void **iov_base_ptrs,
+					 void **iov_data_ptrs,
+					 const size_t plain_size,
+					 const size_t pkt_size,
+					 struct scatterlist * const sg_alloc,
+					 const size_t max_sg_alloc,
+					 struct scatterlist ** const sg_pkts,
+					 size_t *nr_plain_pages)
+{
+	int iov_page_indices[QUIC_MAX_IOVEC_SEGMENTS];
+	struct scatterlist *sg;
+	unsigned int pkt_i = 0;
+	ssize_t left_on_page;
+	size_t pkt_left;
+	unsigned int i;
+	size_t seg_len;
+	off_t page_ofs;
+	off_t seg_ofs;
+	int ret = 0;
+	int page_i;
+
+	if (msg->msg_iter.nr_segs >= QUIC_MAX_IOVEC_SEGMENTS) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ret = quic_get_plain_user_pages(msg, plain_pages, iov_page_indices);
+	if (ret < 0)
+		goto out;
+
+	*nr_plain_pages = ret;
+	sg = sg_alloc;
+	sg_pkts[pkt_i] = sg;
+	sg_unmark_end(sg);
+	pkt_left = pkt_size;
+	for (i = 0; i < msg->msg_iter.nr_segs; ++i) {
+		page_ofs = ((unsigned long)msg->msg_iter.iov[i].iov_base
+			   & (PAGE_SIZE - 1));
+		page_i = 0;
+		if (!i) {
+			page_ofs += msg->msg_iter.iov_offset;
+			while (page_ofs >= PAGE_SIZE) {
+				page_ofs -= PAGE_SIZE;
+				page_i++;
+			}
+		}
+
+		seg_len = msg->msg_iter.iov[i].iov_len;
+		page_i += iov_page_indices[i];
+
+		if (page_i >= QUIC_MAX_PLAIN_PAGES)
+			return -EFAULT;
+
+		seg_ofs = 0;
+		while (seg_ofs < seg_len) {
+			if (sg - sg_alloc > max_sg_alloc)
+				return -EFAULT;
+
+			sg_unmark_end(sg);
+			left_on_page = min_t(size_t, PAGE_SIZE - page_ofs,
+					     seg_len - seg_ofs);
+			if (left_on_page <= 0)
+				return -EFAULT;
+
+			if (left_on_page > pkt_left) {
+				sg_set_page(sg, plain_pages[page_i], pkt_left,
+					    page_ofs);
+				pkt_i++;
+				seg_ofs += pkt_left;
+				page_ofs += pkt_left;
+				sg_mark_end(sg);
+				sg++;
+				sg_pkts[pkt_i] = sg;
+				pkt_left = pkt_size;
+				continue;
+			}
+			sg_set_page(sg, plain_pages[page_i], left_on_page,
+				    page_ofs);
+			page_i++;
+			page_ofs = 0;
+			seg_ofs += left_on_page;
+			pkt_left -= left_on_page;
+			if (pkt_left == 0 ||
+			    (seg_ofs == seg_len &&
+			     i == msg->msg_iter.nr_segs - 1)) {
+				sg_mark_end(sg);
+				pkt_i++;
+				sg++;
+				sg_pkts[pkt_i] = sg;
+				pkt_left = pkt_size;
+			} else {
+				sg++;
+			}
+		}
+	}
+
+	if (pkt_left && pkt_left != pkt_size) {
+		pkt_i++;
+		sg_mark_end(sg);
+	}
+	ret = pkt_i;
+
+out:
+	return ret;
+}
+
+/* sg_alloc: allocated zeroed array of scatterlists
+ * cipher_page: preallocated compound page
+ */
+static int quic_sg_cipher_from_pkts(const size_t cipher_tag_size,
+				    const size_t plain_pkt_size,
+				    const size_t plain_size,
+				    struct page * const cipher_page,
+				    struct scatterlist * const sg_alloc,
+				    const size_t nr_sg_alloc,
+				    struct scatterlist ** const sg_cipher)
+{
+	const size_t cipher_pkt_size = plain_pkt_size + cipher_tag_size;
+	size_t pkts = DIV_ROUND_UP(plain_size, plain_pkt_size);
+	struct scatterlist *sg = sg_alloc;
+	int pkt_i;
+	void *ptr;
+
+	if (pkts > nr_sg_alloc)
+		return -EINVAL;
+
+	ptr = page_address(cipher_page);
+	for (pkt_i = 0; pkt_i < pkts;
+		++pkt_i, ptr += cipher_pkt_size, ++sg) {
+		sg_set_buf(sg, ptr, cipher_pkt_size);
+		sg_mark_end(sg);
+		sg_cipher[pkt_i] = sg;
+	}
+	return pkts;
+}
+
+/* fast copy from scatterlist to a buffer assuming that all pages are
+ * available in kernel memory.
+ */
+static int quic_sg_pcopy_to_buffer_kernel(struct scatterlist *sg,
+					  u8 *buffer,
+					  size_t bytes_to_copy,
+					  off_t offset_to_read)
+{
+	off_t sg_remain = sg->length;
+	size_t to_copy;
+
+	if (!bytes_to_copy)
+		return 0;
+
+	// skip to offset first
+	while (offset_to_read > 0) {
+		if (!sg_remain)
+			return -EINVAL;
+		if (offset_to_read < sg_remain) {
+			sg_remain -= offset_to_read;
+			break;
+		}
+		offset_to_read -= sg_remain;
+		sg = sg_next(sg);
+		if (!sg)
+			return -EINVAL;
+		sg_remain = sg->length;
+	}
+
+	// traverse sg list from offset to offset + bytes_to_copy
+	while (bytes_to_copy) {
+		to_copy = min_t(size_t, bytes_to_copy, sg_remain);
+		if (!to_copy)
+			return -EINVAL;
+		memcpy(buffer, sg_virt(sg) + (sg->length - sg_remain), to_copy);
+		buffer += to_copy;
+		bytes_to_copy -= to_copy;
+		if (bytes_to_copy) {
+			sg = sg_next(sg);
+			if (!sg)
+				return -EINVAL;
+			sg_remain = sg->length;
+		}
+	}
+
+	return 0;
+}
+
+static int quic_copy_header(struct scatterlist *sg_plain,
+			    u8 *buf, const size_t buf_len,
+			    const size_t conn_id_len)
+{
+	u8 *pkt = sg_virt(sg_plain);
+	size_t hdr_len;
+
+	hdr_len = 1 + conn_id_len + ((*pkt & 0x03) + 1);
+	if (hdr_len > QUIC_MAX_SHORT_HEADER_SIZE || hdr_len > buf_len)
+		return -EINVAL;
+
+	WARN_ON_ONCE(quic_sg_pcopy_to_buffer_kernel(sg_plain, buf, hdr_len, 0));
+	return hdr_len;
+}
+
+static u64 quic_unpack_pkt_num(struct quic_tx_ancillary_data * const control,
+			       const u8 * const hdr,
+			       const off_t payload_crypto_off)
+{
+	u64 truncated_pn = 0;
+	u64 candidate_pn;
+	u64 expected_pn;
+	u64 pn_hwin;
+	u64 pn_mask;
+	u64 pn_len;
+	u64 pn_win;
+	int i;
+
+	pn_len = (hdr[0] & 0x03) + 1;
+	expected_pn = control->next_pkt_num;
+
+	for (i = 1 + control->conn_id_length; i < payload_crypto_off; ++i) {
+		truncated_pn <<= 8;
+		truncated_pn |= hdr[i];
+	}
+
+	pn_win = 1ULL << (pn_len << 3);
+	pn_hwin = pn_win >> 1;
+	pn_mask = pn_win - 1;
+	candidate_pn = (expected_pn & ~pn_mask) | truncated_pn;
+
+	if (expected_pn > pn_hwin &&
+	    candidate_pn <= expected_pn - pn_hwin &&
+	    candidate_pn < (1ULL << 62) - pn_win)
+		return candidate_pn + pn_win;
+
+	if (candidate_pn > expected_pn + pn_hwin &&
+	    candidate_pn >= pn_win)
+		return candidate_pn - pn_win;
+
+	return candidate_pn;
+}
+
+static int
+quic_construct_header_prot_mask(struct quic_internal_crypto_context *crypto_ctx,
+				struct skcipher_request *hdr_mask_req,
+				struct scatterlist *sg_cipher_pkt,
+				off_t sample_offset,
+				u8 *hdr_mask)
+{
+	u8 *sample = sg_virt(sg_cipher_pkt) + sample_offset;
+	u8 hdr_ctr[sizeof(u32) + QUIC_CIPHER_MAX_IV_SIZE];
+	u8 chacha20_zeros[5] = {0, 0, 0, 0, 0};
+	struct scatterlist sg_cipher_sample;
+	struct scatterlist sg_hdr_mask;
+	struct crypto_wait wait_header;
+	__le32	counter;
+
+	BUILD_BUG_ON(QUIC_HDR_MASK_SIZE
+		     < sizeof(u32) + QUIC_CIPHER_MAX_IV_SIZE);
+
+	sg_init_one(&sg_hdr_mask, hdr_mask, QUIC_HDR_MASK_SIZE);
+	skcipher_request_set_callback(hdr_mask_req, 0, crypto_req_done,
+				      &wait_header);
+
+	if (crypto_ctx->conn_info.cipher_type == TLS_CIPHER_CHACHA20_POLY1305) {
+		sg_init_one(&sg_cipher_sample, (u8 *)chacha20_zeros,
+			    sizeof(chacha20_zeros));
+		counter = cpu_to_le32(*((u32 *)sample));
+		memset(hdr_ctr, 0, sizeof(hdr_ctr));
+		memcpy((u8 *)hdr_ctr, (u8 *)&counter, sizeof(u32));
+		memcpy((u8 *)hdr_ctr + sizeof(u32),
+		       (sample + sizeof(u32)),
+		       QUIC_CIPHER_MAX_IV_SIZE);
+		skcipher_request_set_crypt(hdr_mask_req, &sg_cipher_sample,
+					   &sg_hdr_mask, 5, hdr_ctr);
+	} else {
+		// cipher pages are continuous, get the pointer to the sg data
+		// directly, pages are allocated in kernel
+		sg_init_one(&sg_cipher_sample, sample, QUIC_HDR_MASK_SIZE);
+		skcipher_request_set_crypt(hdr_mask_req, &sg_cipher_sample,
+					   &sg_hdr_mask, QUIC_HDR_MASK_SIZE,
+					   NULL);
+	}
+
+	return crypto_wait_req(crypto_skcipher_encrypt(hdr_mask_req),
+			       &wait_header);
+}
+
+static int quic_protect_header(struct quic_internal_crypto_context *crypto_ctx,
+			       struct quic_tx_ancillary_data *control,
+			       struct skcipher_request *hdr_mask_req,
+			       struct scatterlist *sg_cipher_pkt,
+			       int payload_crypto_off)
+{
+	u8 hdr_mask[QUIC_HDR_MASK_SIZE];
+	off_t quic_pkt_num_off;
+	u8 quic_pkt_num_len;
+	u8 *cipher_hdr;
+	int err;
+	int i;
+
+	quic_pkt_num_off = 1 + control->conn_id_length;
+	quic_pkt_num_len = payload_crypto_off - quic_pkt_num_off;
+
+	if (quic_pkt_num_len > 4)
+		return -EPERM;
+
+	err = quic_construct_header_prot_mask(crypto_ctx, hdr_mask_req,
+					      sg_cipher_pkt,
+					      payload_crypto_off +
+					      (4 - quic_pkt_num_len),
+					      hdr_mask);
+	if (unlikely(err))
+		return err;
+
+	cipher_hdr = sg_virt(sg_cipher_pkt);
+	// protect the public flags
+	cipher_hdr[0] ^= (hdr_mask[0] & 0x1f);
+
+	for (i = 0; i < quic_pkt_num_len; ++i)
+		cipher_hdr[quic_pkt_num_off + i] ^= hdr_mask[1 + i];
+
+	return 0;
+}
+
+static
+void quic_construct_ietf_nonce(u8 *nonce,
+			       struct quic_internal_crypto_context *crypto_ctx,
+			       u64 quic_pkt_num)
+{
+	u8 *iv = quic_payload_iv(crypto_ctx);
+	int i;
+
+	for (i = quic_crypto_nonce_size(crypto_ctx->conn_info.cipher_type) - 1;
+	     i >= 0 && quic_pkt_num;
+	     --i, quic_pkt_num >>= 8)
+		nonce[i] = iv[i] ^ (u8)quic_pkt_num;
+
+	for (; i >= 0; --i)
+		nonce[i] = iv[i];
+}
+
+static ssize_t quic_sendpage(struct quic_context *ctx,
+			     struct sock *sk,
+			     struct msghdr *msg,
+			     const size_t cipher_size,
+			     struct page * const cipher_page)
+{
+	struct kvec iov;
+	ssize_t ret;
+
+	iov.iov_base = page_address(cipher_page);
+	iov.iov_len = cipher_size;
+	iov_iter_kvec(&msg->msg_iter, WRITE, &iov, 1, cipher_size);
+	ret = security_socket_sendmsg(sk->sk_socket, msg, msg_data_left(msg));
+	if (ret)
+		return ret;
+
+	ret = ctx->sk_proto->sendmsg(sk, msg, msg_data_left(msg));
+	WARN_ON(ret == -EIOCBQUEUED);
+	return ret;
+}
+
+static int quic_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
+{
+	struct quic_internal_crypto_context *crypto_ctx = NULL;
+	struct scatterlist *sg_cipher_pkts[QUIC_MAX_GSO_FRAGS];
+	struct scatterlist *sg_plain_pkts[QUIC_MAX_GSO_FRAGS];
+	struct page *plain_pages[QUIC_MAX_PLAIN_PAGES];
+	void *plain_base_ptrs[QUIC_MAX_IOVEC_SEGMENTS];
+	void *plain_data_ptrs[QUIC_MAX_IOVEC_SEGMENTS];
+	struct msghdr msg_cipher = {
+		.msg_name = msg->msg_name,
+		.msg_namelen = msg->msg_namelen,
+		.msg_flags = msg->msg_flags,
+		.msg_control = msg->msg_control,
+		.msg_controllen = msg->msg_controllen,
+	};
+	struct quic_connection_rhash *connhash = NULL;
+	struct quic_context *ctx = quic_get_ctx(sk);
+	u8 hdr_buf[QUIC_MAX_SHORT_HEADER_SIZE];
+	struct skcipher_request *hdr_mask_req;
+	struct quic_tx_ancillary_data control;
+	u8 nonce[QUIC_CIPHER_MAX_NONCE_SIZE];
+	struct	aead_request *aead_req = NULL;
+	struct scatterlist *sg_cipher = NULL;
+	struct udp_sock *up = udp_sk(sk);
+	struct scatterlist *sg_plain = NULL;
+	u16 gso_pkt_size = up->gso_size;
+	size_t last_plain_pkt_size = 0;
+	off_t	payload_crypto_offset;
+	struct crypto_aead *tfm = NULL;
+	size_t nr_plain_pages = 0;
+	struct crypto_wait waiter;
+	size_t nr_sg_cipher_pkts;
+	size_t nr_sg_plain_pkts;
+	ssize_t hdr_buf_len = 0;
+	size_t nr_sg_alloc = 0;
+	size_t plain_pkt_size;
+	u64	full_pkt_num;
+	size_t cipher_size;
+	size_t plain_size;
+	size_t pkt_size;
+	size_t tag_size;
+	int ret = 0;
+	int pkt_i;
+	int err;
+
+	memset(&hdr_buf[0], 0, QUIC_MAX_SHORT_HEADER_SIZE);
+	hdr_buf_len = copy_from_iter(hdr_buf, QUIC_MAX_SHORT_HEADER_SIZE,
+				     &msg->msg_iter);
+	if (hdr_buf_len <= 0) {
+		ret = -EINVAL;
+		goto out;
+	}
+	iov_iter_revert(&msg->msg_iter, hdr_buf_len);
+
+	ctx = quic_get_ctx(sk);
+
+	// Bypass for anything that is guaranteed not QUIC.
+	plain_size = len;
+
+	if (plain_size < 2)
+		return ctx->sk_proto->sendmsg(sk, msg, len);
+
+	// Bypass for other than short header.
+	if ((hdr_buf[0] & 0xc0) != 0x40)
+		return ctx->sk_proto->sendmsg(sk, msg, len);
+
+	// Crypto adds a tag after the packet. Corking a payload would produce
+	// a crypto tag after each portion. Use GSO instead.
+	if ((msg->msg_flags & MSG_MORE) || up->pending) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = quic_sendmsg_validate(msg);
+	if (ret)
+		goto out;
+
+	ret = quic_extract_ancillary_data(msg, &control, &gso_pkt_size);
+	if (ret)
+		goto out;
+
+	// Reserved bits with ancillary data present are an error.
+	if (control.flags & ~QUIC_ANCILLARY_FLAGS) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	// Bypass offload on request. First packet bypass applies to all
+	// packets in the GSO pack.
+	if (control.flags & QUIC_BYPASS_ENCRYPTION)
+		return ctx->sk_proto->sendmsg(sk, msg, len);
+
+	if (hdr_buf_len < 1 + control.conn_id_length) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	// Fetch the flow
+	connhash = quic_lookup_connection(ctx, &hdr_buf[1], &control);
+	if (!connhash) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	crypto_ctx = &connhash->crypto_ctx;
+
+	tag_size = quic_crypto_tag_size(crypto_ctx->conn_info.cipher_type);
+
+	// For GSO, use the GSO size minus cipher tag size as the packet size;
+	// for non-GSO, use the size of the whole plaintext.
+	// Reduce the packet size by tag size to keep the original packet size
+	// for the rest of the UDP path in the stack.
+	if (!gso_pkt_size) {
+		plain_pkt_size = plain_size;
+	} else {
+		if (gso_pkt_size < tag_size)
+			goto out;
+
+		plain_pkt_size = gso_pkt_size - tag_size;
+	}
+
+	// Build scatterlist from the input data, split by GSO minus the
+	// crypto tag size.
+	nr_sg_alloc = quic_sg_capacity_from_msg(plain_pkt_size,
+						msg->msg_iter.iov_offset,
+						plain_size);
+	if ((nr_sg_alloc * 2) >= QUIC_MAX_SG_ALLOC_ELEMENTS) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	sg_plain = ctx->sg_alloc;
+	sg_cipher = sg_plain + nr_sg_alloc;
+
+	ret = quic_sg_plain_from_mapped_msg(msg, plain_pages,
+					    plain_base_ptrs,
+					    plain_data_ptrs, plain_size,
+					    plain_pkt_size, sg_plain,
+					    nr_sg_alloc, sg_plain_pkts,
+					    &nr_plain_pages);
+
+	if (ret < 0)
+		goto out;
+
+	nr_sg_plain_pkts = ret;
+	last_plain_pkt_size = plain_size % plain_pkt_size;
+	if (!last_plain_pkt_size)
+		last_plain_pkt_size = plain_pkt_size;
+
+	// Build scatterlist for the ciphertext, split by GSO.
+	cipher_size = plain_size + nr_sg_plain_pkts * tag_size;
+
+	if (DIV_ROUND_UP(cipher_size, PAGE_SIZE)
+	    >= (1 << QUIC_MAX_CIPHER_PAGES_ORDER)) {
+		ret = -ENOMEM;
+		goto out_put_pages;
+	}
+
+	ret = quic_sg_cipher_from_pkts(tag_size, plain_pkt_size, plain_size,
+				       ctx->cipher_page, sg_cipher, nr_sg_alloc,
+				       sg_cipher_pkts);
+	if (ret < 0)
+		goto out_put_pages;
+
+	nr_sg_cipher_pkts = ret;
+
+	if (nr_sg_plain_pkts != nr_sg_cipher_pkts) {
+		ret = -EPERM;
+		goto out_put_pages;
+	}
+
+	// Encrypt and protect header for each packet individually.
+	tfm = crypto_ctx->packet_aead;
+	crypto_aead_clear_flags(tfm, ~0);
+	aead_req = aead_request_alloc(tfm, GFP_KERNEL);
+	if (!aead_req) {
+		aead_request_free(aead_req);
+		ret = -ENOMEM;
+		goto out_put_pages;
+	}
+
+	hdr_mask_req = skcipher_request_alloc(crypto_ctx->header_tfm,
+					      GFP_KERNEL);
+	if (!hdr_mask_req) {
+		aead_request_free(aead_req);
+		ret = -ENOMEM;
+		goto out_put_pages;
+	}
+
+	for (pkt_i = 0; pkt_i < nr_sg_plain_pkts; ++pkt_i) {
+		payload_crypto_offset =
+			quic_copy_header(sg_plain_pkts[pkt_i],
+					 hdr_buf,
+					 sizeof(hdr_buf),
+					 control.conn_id_length);
+
+		full_pkt_num = quic_unpack_pkt_num(&control, hdr_buf,
+						   payload_crypto_offset);
+
+		pkt_size = (pkt_i + 1 < nr_sg_plain_pkts
+				? plain_pkt_size
+				: last_plain_pkt_size)
+			    - payload_crypto_offset;
+		if (pkt_size < 0) {
+			aead_request_free(aead_req);
+			skcipher_request_free(hdr_mask_req);
+			ret = -EINVAL;
+			goto out_put_pages;
+		}
+
+		/* Construct nonce and initialize request */
+		quic_construct_ietf_nonce(nonce, crypto_ctx, full_pkt_num);
+
+		/* Encrypt the body */
+		aead_request_set_callback(aead_req,
+					  CRYPTO_TFM_REQ_MAY_BACKLOG
+					  | CRYPTO_TFM_REQ_MAY_SLEEP,
+					  crypto_req_done, &waiter);
+		aead_request_set_crypt(aead_req, sg_plain_pkts[pkt_i],
+				       sg_cipher_pkts[pkt_i],
+				       pkt_size,
+				       nonce);
+		aead_request_set_ad(aead_req, payload_crypto_offset);
+		err = crypto_wait_req(crypto_aead_encrypt(aead_req), &waiter);
+		if (unlikely(err)) {
+			ret = err;
+			aead_request_free(aead_req);
+			skcipher_request_free(hdr_mask_req);
+			goto out_put_pages;
+		}
+
+		/* Protect the header */
+		memcpy(sg_virt(sg_cipher_pkts[pkt_i]), hdr_buf,
+		       payload_crypto_offset);
+
+		err = quic_protect_header(crypto_ctx, &control,
+					  hdr_mask_req,
+					  sg_cipher_pkts[pkt_i],
+					  payload_crypto_offset);
+		if (unlikely(err)) {
+			ret = err;
+			aead_request_free(aead_req);
+			skcipher_request_free(hdr_mask_req);
+			goto out_put_pages;
+		}
+	}
+	skcipher_request_free(hdr_mask_req);
+	aead_request_free(aead_req);
+
+	// Deliver to the next layer.
+	if (ctx->sk_proto->sendpage) {
+		msg_cipher.msg_flags |= MSG_MORE;
+		err = ctx->sk_proto->sendmsg(sk, &msg_cipher, 0);
+		if (err < 0) {
+			ret = err;
+			goto out_put_pages;
+		}
+
+		err = ctx->sk_proto->sendpage(sk, ctx->cipher_page, 0,
+					      cipher_size, 0);
+		if (err < 0) {
+			ret = err;
+			goto out_put_pages;
+		}
+		if (err != cipher_size) {
+			ret = -EINVAL;
+			goto out_put_pages;
+		}
+		ret = plain_size;
+	} else {
+		ret = quic_sendpage(ctx, sk, &msg_cipher, cipher_size,
+				    ctx->cipher_page);
+		// indicate full plaintext transmission to the caller.
+		if (ret > 0)
+			ret = plain_size;
+	}
+
+out_put_pages:
+	quic_put_plain_user_pages(plain_pages, nr_plain_pages);
+
+out:
+	return ret;
+}
+
+static int quic_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t len)
+{
+	struct quic_context *ctx;
+	int ret;
+
+	rcu_read_lock();
+	ctx = quic_get_ctx(sk);
+	rcu_read_unlock();
+	if (!ctx)
+		return -EINVAL;
+
+	mutex_lock(&ctx->sendmsg_mux);
+	ret = quic_sendmsg(sk, msg, len);
+	mutex_unlock(&ctx->sendmsg_mux);
+	return ret;
+}
+
+static void quic_release_resources(struct sock *sk)
+{
+	struct quic_internal_crypto_context *crypto_ctx;
+	struct quic_connection_rhash *connhash;
+	struct inet_sock *inet = inet_sk(sk);
+	struct rhashtable_iter hti;
+	struct quic_context *ctx;
+	struct proto *sk_proto;
+
+	rcu_read_lock();
+	ctx = quic_get_ctx(sk);
+	if (!ctx) {
+		rcu_read_unlock();
+		return;
+	}
+
+	sk_proto = ctx->sk_proto;
+
+	rhashtable_walk_enter(&ctx->tx_connections, &hti);
+	rhashtable_walk_start(&hti);
+
+	while ((connhash = rhashtable_walk_next(&hti))) {
+		if (IS_ERR(connhash)) {
+			if (PTR_ERR(connhash) == -EAGAIN)
+				continue;
+			break;
+		}
+
+		crypto_ctx = &connhash->crypto_ctx;
+		crypto_free_aead(crypto_ctx->packet_aead);
+		crypto_free_skcipher(crypto_ctx->header_tfm);
+		memzero_explicit(crypto_ctx, sizeof(*crypto_ctx));
+	}
+
+	rhashtable_walk_stop(&hti);
+	rhashtable_walk_exit(&hti);
+	rhashtable_destroy(&ctx->tx_connections);
+
+	if (ctx->cipher_page) {
+		quic_free_cipher_page(ctx->cipher_page);
+		ctx->cipher_page = NULL;
+	}
+
+	rcu_read_unlock();
+
+	write_lock_bh(&sk->sk_callback_lock);
+	rcu_assign_pointer(inet->ulp_data, NULL);
+	WRITE_ONCE(sk->sk_prot, sk_proto);
+	write_unlock_bh(&sk->sk_callback_lock);
+
+	kfree_rcu(ctx, rcu);
+}
+
+static void
+quic_prep_protos(unsigned int af, struct proto *proto, const struct proto *base)
+{
+	if (likely(test_bit(af, &af_init_done)))
+		return;
+
+	spin_lock(&quic_proto_lock);
+	if (test_bit(af, &af_init_done))
+		goto out_unlock;
+
+	*proto			= *base;
+	proto->setsockopt	= quic_setsockopt;
+	proto->getsockopt	= quic_getsockopt;
+	proto->sendmsg		= quic_sendmsg_locked;
+
+	smp_mb__before_atomic(); /* proto calls should be visible first */
+	set_bit(af, &af_init_done);
+
+out_unlock:
+	spin_unlock(&quic_proto_lock);
+}
+
+static void quic_update_proto(struct sock *sk, struct quic_context *ctx)
+{
+	struct proto *udp_proto, *quic_proto;
+	struct inet_sock *inet = inet_sk(sk);
+
+	udp_proto = READ_ONCE(sk->sk_prot);
+	ctx->sk_proto = udp_proto;
+	quic_proto = sk->sk_family == AF_INET ? &quic_v4_proto : &quic_v6_proto;
+
+	quic_prep_protos(sk->sk_family, quic_proto, udp_proto);
+
+	write_lock_bh(&sk->sk_callback_lock);
+	rcu_assign_pointer(inet->ulp_data, ctx);
+	WRITE_ONCE(sk->sk_prot, quic_proto);
+	write_unlock_bh(&sk->sk_callback_lock);
+}
+
+static int quic_init(struct sock *sk)
+{
+	struct quic_context *ctx;
+
+	ctx = quic_ctx_create();
+	if (!ctx)
+		return -ENOMEM;
+
+	quic_update_proto(sk, ctx);
+
+	return 0;
+}
+
+static void quic_release(struct sock *sk)
+{
+	lock_sock(sk);
+	quic_release_resources(sk);
+	release_sock(sk);
+}
+
+static struct udp_ulp_ops quic_ulp_ops __read_mostly = {
+	.name		= "quic-crypto",
+	.owner		= THIS_MODULE,
+	.init		= quic_init,
+	.release	= quic_release,
+};
+
+static int __init quic_register(void)
+{
+	udp_register_ulp(&quic_ulp_ops);
+	return 0;
+}
+
+static void __exit quic_unregister(void)
+{
+	udp_unregister_ulp(&quic_ulp_ops);
+}
+
+module_init(quic_register);
+module_exit(quic_unregister);
+
+MODULE_DESCRIPTION("QUIC crypto ULP");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_UDP_ULP("quic-crypto");
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [net-next v2 5/6] Add flow counters and Tx processing error counter
  2022-08-17 20:09 ` [net-next v2 " Adel Abouchaev
                     ` (3 preceding siblings ...)
  2022-08-17 20:09   ` [net-next v2 4/6] Implement QUIC offload functions Adel Abouchaev
@ 2022-08-17 20:09   ` Adel Abouchaev
  2022-08-17 20:09   ` [net-next v2 6/6] Add self tests for ULP operations, flow setup and crypto tests Adel Abouchaev
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-17 20:09 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

Added flow counters. Total flow counter is accumulative, the current shows
the number of flows currently in flight, the error counters is accumulating
the number of errors during Tx processing.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>

---

Updated enum bracket to follow enum keyword. Removed extra blank lines.
---
 include/net/netns/mib.h   |  3 +++
 include/net/quic.h        | 10 +++++++++
 include/net/snmp.h        |  6 +++++
 include/uapi/linux/snmp.h |  9 ++++++++
 net/quic/Makefile         |  2 +-
 net/quic/quic_main.c      | 46 +++++++++++++++++++++++++++++++++++++++
 net/quic/quic_proc.c      | 45 ++++++++++++++++++++++++++++++++++++++
 7 files changed, 120 insertions(+), 1 deletion(-)
 create mode 100644 net/quic/quic_proc.c

diff --git a/include/net/netns/mib.h b/include/net/netns/mib.h
index 7e373664b1e7..dcbba3d1ceec 100644
--- a/include/net/netns/mib.h
+++ b/include/net/netns/mib.h
@@ -24,6 +24,9 @@ struct netns_mib {
 #if IS_ENABLED(CONFIG_TLS)
 	DEFINE_SNMP_STAT(struct linux_tls_mib, tls_statistics);
 #endif
+#if IS_ENABLED(CONFIG_QUIC)
+	DEFINE_SNMP_STAT(struct linux_quic_mib, quic_statistics);
+#endif
 #ifdef CONFIG_MPTCP
 	DEFINE_SNMP_STAT(struct mptcp_mib, mptcp_statistics);
 #endif
diff --git a/include/net/quic.h b/include/net/quic.h
index cafe01174e60..6362d827d266 100644
--- a/include/net/quic.h
+++ b/include/net/quic.h
@@ -25,6 +25,16 @@
 #define QUIC_MAX_PLAIN_PAGES		16
 #define QUIC_MAX_CIPHER_PAGES_ORDER	4
 
+#define __QUIC_INC_STATS(net, field)				\
+	__SNMP_INC_STATS((net)->mib.quic_statistics, field)
+#define QUIC_INC_STATS(net, field)				\
+	SNMP_INC_STATS((net)->mib.quic_statistics, field)
+#define QUIC_DEC_STATS(net, field)				\
+	SNMP_DEC_STATS((net)->mib.quic_statistics, field)
+
+int __net_init quic_proc_init(struct net *net);
+void __net_exit quic_proc_fini(struct net *net);
+
 struct quic_internal_crypto_context {
 	struct quic_connection_info	conn_info;
 	struct crypto_skcipher		*header_tfm;
diff --git a/include/net/snmp.h b/include/net/snmp.h
index 468a67836e2f..f94680a3e9e8 100644
--- a/include/net/snmp.h
+++ b/include/net/snmp.h
@@ -117,6 +117,12 @@ struct linux_tls_mib {
 	unsigned long	mibs[LINUX_MIB_TLSMAX];
 };
 
+/* Linux QUIC */
+#define LINUX_MIB_QUICMAX	__LINUX_MIB_QUICMAX
+struct linux_quic_mib {
+	unsigned long	mibs[LINUX_MIB_QUICMAX];
+};
+
 #define DEFINE_SNMP_STAT(type, name)	\
 	__typeof__(type) __percpu *name
 #define DEFINE_SNMP_STAT_ATOMIC(type, name)	\
diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index 4d7470036a8b..ca1e626dbdb4 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -349,4 +349,13 @@ enum
 	__LINUX_MIB_TLSMAX
 };
 
+/* linux QUIC mib definitions */
+enum {
+	LINUX_MIB_QUICNUM = 0,
+	LINUX_MIB_QUICCURRTXSW,			/* QuicCurrTxSw */
+	LINUX_MIB_QUICTXSW,			/* QuicTxSw */
+	LINUX_MIB_QUICTXSWERROR,		/* QuicTxSwError */
+	__LINUX_MIB_QUICMAX
+};
+
 #endif	/* _LINUX_SNMP_H */
diff --git a/net/quic/Makefile b/net/quic/Makefile
index 928239c4d08c..a885cd8bc4e0 100644
--- a/net/quic/Makefile
+++ b/net/quic/Makefile
@@ -5,4 +5,4 @@
 
 obj-$(CONFIG_QUIC) += quic.o
 
-quic-y := quic_main.o
+quic-y := quic_main.o quic_proc.o
diff --git a/net/quic/quic_main.c b/net/quic/quic_main.c
index 95de3a961479..4f2484fe43ed 100644
--- a/net/quic/quic_main.c
+++ b/net/quic/quic_main.c
@@ -335,6 +335,8 @@ static int do_quic_conn_add_tx(struct sock *sk, sockptr_t optval,
 	if (rc < 0)
 		goto err_free_ciphers;
 
+	QUIC_INC_STATS(sock_net(sk), LINUX_MIB_QUICCURRTXSW);
+	QUIC_INC_STATS(sock_net(sk), LINUX_MIB_QUICTXSW);
 	return 0;
 
 err_free_ciphers:
@@ -383,6 +385,7 @@ static int do_quic_conn_del_tx(struct sock *sk, sockptr_t optval,
 	crypto_free_aead(crypto_ctx->packet_aead);
 	memzero_explicit(crypto_ctx, sizeof(*crypto_ctx));
 	kfree(connhash);
+	QUIC_DEC_STATS(sock_net(sk), LINUX_MIB_QUICCURRTXSW);
 
 	return 0;
 }
@@ -408,6 +411,9 @@ static int do_quic_setsockopt(struct sock *sk, int optname, sockptr_t optval,
 		break;
 	}
 
+	if (rc)
+		QUIC_INC_STATS(sock_net(sk), LINUX_MIB_QUICTXSWERROR);
+
 	return rc;
 }
 
@@ -1213,6 +1219,9 @@ static int quic_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	quic_put_plain_user_pages(plain_pages, nr_plain_pages);
 
 out:
+	if (unlikely(ret < 0))
+		QUIC_INC_STATS(sock_net(sk), LINUX_MIB_QUICTXSWERROR);
+
 	return ret;
 }
 
@@ -1345,6 +1354,36 @@ static void quic_release(struct sock *sk)
 	release_sock(sk);
 }
 
+static int __net_init quic_init_net(struct net *net)
+{
+	int err;
+
+	net->mib.quic_statistics = alloc_percpu(struct linux_quic_mib);
+	if (!net->mib.quic_statistics)
+		return -ENOMEM;
+
+	err = quic_proc_init(net);
+	if (err)
+		goto err_free_stats;
+
+	return 0;
+
+err_free_stats:
+	free_percpu(net->mib.quic_statistics);
+	return err;
+}
+
+static void __net_exit quic_exit_net(struct net *net)
+{
+	quic_proc_fini(net);
+	free_percpu(net->mib.quic_statistics);
+}
+
+static struct pernet_operations quic_proc_ops = {
+	.init = quic_init_net,
+	.exit = quic_exit_net,
+};
+
 static struct udp_ulp_ops quic_ulp_ops __read_mostly = {
 	.name		= "quic-crypto",
 	.owner		= THIS_MODULE,
@@ -1354,6 +1393,12 @@ static struct udp_ulp_ops quic_ulp_ops __read_mostly = {
 
 static int __init quic_register(void)
 {
+	int err;
+
+	err = register_pernet_subsys(&quic_proc_ops);
+	if (err)
+		return err;
+
 	udp_register_ulp(&quic_ulp_ops);
 	return 0;
 }
@@ -1361,6 +1406,7 @@ static int __init quic_register(void)
 static void __exit quic_unregister(void)
 {
 	udp_unregister_ulp(&quic_ulp_ops);
+	unregister_pernet_subsys(&quic_proc_ops);
 }
 
 module_init(quic_register);
diff --git a/net/quic/quic_proc.c b/net/quic/quic_proc.c
new file mode 100644
index 000000000000..cb4fe7a589b5
--- /dev/null
+++ b/net/quic/quic_proc.c
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+/* Copyright (C) 2019 Meta Platforms, Inc. */
+
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <net/snmp.h>
+#include <net/quic.h>
+
+#ifdef CONFIG_PROC_FS
+static const struct snmp_mib quic_mib_list[] = {
+	SNMP_MIB_ITEM("QuicCurrTxSw", LINUX_MIB_QUICCURRTXSW),
+	SNMP_MIB_ITEM("QuicTxSw", LINUX_MIB_QUICTXSW),
+	SNMP_MIB_ITEM("QuicTxSwError", LINUX_MIB_QUICTXSWERROR),
+	SNMP_MIB_SENTINEL
+};
+
+static int quic_statistics_seq_show(struct seq_file *seq, void *v)
+{
+	unsigned long buf[LINUX_MIB_QUICMAX] = {};
+	struct net *net = seq->private;
+	int i;
+
+	snmp_get_cpu_field_batch(buf, quic_mib_list, net->mib.quic_statistics);
+	for (i = 0; quic_mib_list[i].name; i++)
+		seq_printf(seq, "%-32s\t%lu\n", quic_mib_list[i].name, buf[i]);
+
+	return 0;
+}
+#endif
+
+int __net_init quic_proc_init(struct net *net)
+{
+#ifdef CONFIG_PROC_FS
+	if (!proc_create_net_single("quic_stat", 0444, net->proc_net,
+				    quic_statistics_seq_show, NULL))
+		return -ENOMEM;
+#endif /* CONFIG_PROC_FS */
+
+	return 0;
+}
+
+void __net_exit quic_proc_fini(struct net *net)
+{
+	remove_proc_entry("quic_stat", net->proc_net);
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [net-next v2 6/6] Add self tests for ULP operations, flow setup and crypto tests
  2022-08-17 20:09 ` [net-next v2 " Adel Abouchaev
                     ` (4 preceding siblings ...)
  2022-08-17 20:09   ` [net-next v2 5/6] Add flow counters and Tx processing error counter Adel Abouchaev
@ 2022-08-17 20:09   ` Adel Abouchaev
  2022-08-18  2:18   ` [net-next v2 0/6] net: support QUIC crypto Bagas Sanjaya
  2022-08-24 18:29   ` Xin Long
  7 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-17 20:09 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

Add self tests for ULP operations, flow setup and crypto tests.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>

---

Restored the test build. Changed the QUIC context reference variable
names for the keys and iv to match the uAPI.

Updated alignment, added SPDX license line.

v3: Added Chacha20-Poly1305 test.
---
 tools/testing/selftests/net/.gitignore |    4 +-
 tools/testing/selftests/net/Makefile   |    3 +-
 tools/testing/selftests/net/quic.c     | 1153 ++++++++++++++++++++++++
 tools/testing/selftests/net/quic.sh    |   46 +
 4 files changed, 1204 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/net/quic.c
 create mode 100755 tools/testing/selftests/net/quic.sh

diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore
index 0e5751af6247..9e4d00e13238 100644
--- a/tools/testing/selftests/net/.gitignore
+++ b/tools/testing/selftests/net/.gitignore
@@ -39,4 +39,6 @@ toeplitz
 tun
 cmsg_sender
 unix_connect
-tap
\ No newline at end of file
+tap
+unix_connect
+quic
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index c0ee2955fe54..1a6cbb24a636 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -42,6 +42,7 @@ TEST_PROGS += arp_ndisc_evict_nocarrier.sh
 TEST_PROGS += ndisc_unsolicited_na_test.sh
 TEST_PROGS += arp_ndisc_untracked_subnets.sh
 TEST_PROGS += stress_reuseport_listen.sh
+TEST_PROGS += quic.sh
 TEST_PROGS_EXTENDED := in_netns.sh setup_loopback.sh setup_veth.sh
 TEST_PROGS_EXTENDED += toeplitz_client.sh toeplitz.sh
 TEST_GEN_FILES =  socket nettest
@@ -57,7 +58,7 @@ TEST_GEN_FILES += ipsec
 TEST_GEN_FILES += ioam6_parser
 TEST_GEN_FILES += gro
 TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
-TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls tun tap
+TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls tun tap quic
 TEST_GEN_FILES += toeplitz
 TEST_GEN_FILES += cmsg_sender
 TEST_GEN_FILES += stress_reuseport_listen
diff --git a/tools/testing/selftests/net/quic.c b/tools/testing/selftests/net/quic.c
new file mode 100644
index 000000000000..2aa5e1564f5f
--- /dev/null
+++ b/tools/testing/selftests/net/quic.c
@@ -0,0 +1,1153 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include <arpa/inet.h>
+#include <errno.h>
+#include <error.h>
+#include <fcntl.h>
+#include <poll.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <linux/limits.h>
+#include <linux/quic.h>
+#include <linux/socket.h>
+#include <linux/tls.h>
+#include <linux/tcp.h>
+#include <linux/types.h>
+#include <linux/udp.h>
+
+#include <sys/ioctl.h>
+#include <sys/types.h>
+#include <sys/sendfile.h>
+#include <sys/socket.h>
+#include <sys/stat.h>
+
+#include "../kselftest_harness.h"
+
+#define UDP_ULP		105
+
+#ifndef SOL_UDP
+#define SOL_UDP		17
+#endif
+
+// 1. QUIC ULP Registration Test
+
+FIXTURE(quic_ulp)
+{
+	int sfd;
+	socklen_t len_s;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} server;
+	int default_net_ns_fd;
+	int server_net_ns_fd;
+};
+
+FIXTURE_VARIANT(quic_ulp)
+{
+	unsigned int af_server;
+	char *server_address;
+	unsigned short server_port;
+};
+
+FIXTURE_VARIANT_ADD(quic_ulp, ipv4)
+{
+	.af_server = AF_INET,
+	.server_address = "10.0.0.2",
+	.server_port = 7101,
+};
+
+FIXTURE_VARIANT_ADD(quic_ulp, ipv6)
+{
+	.af_server = AF_INET6,
+	.server_address = "2001::2",
+	.server_port = 7102,
+};
+
+FIXTURE_SETUP(quic_ulp)
+{
+	char path[PATH_MAX];
+	int optval = 1;
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/net", getpid());
+	self->default_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->default_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns2");
+	self->server_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->server_net_ns_fd, 0);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	self->sfd = socket(variant->af_server, SOCK_DGRAM, 0);
+	ASSERT_NE(setsockopt(self->sfd, SOL_SOCKET, SO_REUSEPORT, &optval,
+			     sizeof(optval)), -1);
+	if (variant->af_server == AF_INET) {
+		self->len_s = sizeof(self->server.addr);
+		self->server.addr.sin_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr.sin_addr);
+		self->server.addr.sin_port = htons(variant->server_port);
+		ASSERT_EQ(bind(self->sfd, &self->server.addr, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr,
+				      &self->len_s), 0);
+	} else {
+		self->len_s = sizeof(self->server.addr6);
+		self->server.addr6.sin6_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr6.sin6_addr);
+		self->server.addr6.sin6_port = htons(variant->server_port);
+		ASSERT_EQ(bind(self->sfd, &self->server.addr6, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr6,
+				      &self->len_s), 0);
+	}
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+};
+
+FIXTURE_TEARDOWN(quic_ulp)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	close(self->sfd);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+};
+
+TEST_F(quic_ulp, request_nonexistent_udp_ulp)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_ULP,
+			     "nonexistent", sizeof("nonexistent")), -1);
+	// If UDP_ULP option is not present, the error would be ENOPROTOOPT.
+	ASSERT_EQ(errno, ENOENT);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+};
+
+TEST_F(quic_ulp, request_quic_crypto_udp_ulp)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_ULP,
+			     "quic-crypto", sizeof("quic-crypto")), 0);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+};
+
+// 2. QUIC Data Path Operation Tests
+
+#define DO_NOT_SETUP_FLOW 0
+#define SETUP_FLOW 1
+
+#define DO_NOT_USE_CLIENT 0
+#define USE_CLIENT 1
+
+FIXTURE(quic_data)
+{
+	int sfd, c1fd, c2fd;
+	socklen_t len_c1;
+	socklen_t len_c2;
+	socklen_t len_s;
+
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} client_1;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} client_2;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} server;
+	int default_net_ns_fd;
+	int client_1_net_ns_fd;
+	int client_2_net_ns_fd;
+	int server_net_ns_fd;
+};
+
+FIXTURE_VARIANT(quic_data)
+{
+	unsigned int af_client_1;
+	char *client_1_address;
+	unsigned short client_1_port;
+	uint8_t conn_id_1[8];
+	uint8_t conn_1_key[16];
+	uint8_t conn_1_iv[12];
+	uint8_t conn_1_hdr_key[16];
+	size_t conn_id_1_len;
+	bool setup_flow_1;
+	bool use_client_1;
+	unsigned int af_client_2;
+	char *client_2_address;
+	unsigned short client_2_port;
+	uint8_t conn_id_2[8];
+	uint8_t conn_2_key[16];
+	uint8_t conn_2_iv[12];
+	uint8_t conn_2_hdr_key[16];
+	size_t conn_id_2_len;
+	bool setup_flow_2;
+	bool use_client_2;
+	unsigned int af_server;
+	char *server_address;
+	unsigned short server_port;
+};
+
+FIXTURE_VARIANT_ADD(quic_data, ipv4)
+{
+	.af_client_1 = AF_INET,
+	.client_1_address = "10.0.0.1",
+	.client_1_port = 6667,
+	.conn_id_1 = {0x11, 0x12, 0x13, 0x14},
+	.conn_id_1_len = 4,
+	.setup_flow_1 = SETUP_FLOW,
+	.use_client_1 = USE_CLIENT,
+	.af_client_2 = AF_INET,
+	.client_2_address = "10.0.0.3",
+	.client_2_port = 6668,
+	.conn_id_2 = {0x21, 0x22, 0x23, 0x24},
+	.conn_id_2_len = 4,
+	.setup_flow_2 = SETUP_FLOW,
+	//.use_client_2 = USE_CLIENT,
+	.af_server = AF_INET,
+	.server_address = "10.0.0.2",
+	.server_port = 6669,
+};
+
+FIXTURE_VARIANT_ADD(quic_data, ipv6_mapped_ipv4_two_conns)
+{
+	.af_client_1 = AF_INET6,
+	.client_1_address = "::ffff:10.0.0.1",
+	.client_1_port = 6670,
+	.conn_id_1 = {0x11, 0x12, 0x13, 0x14},
+	.conn_id_1_len = 4,
+	.setup_flow_1 = SETUP_FLOW,
+	.use_client_1 = USE_CLIENT,
+	.af_client_2 = AF_INET6,
+	.client_2_address = "::ffff:10.0.0.3",
+	.client_2_port = 6671,
+	.conn_id_2 = {0x21, 0x22, 0x23, 0x24},
+	.conn_id_2_len = 4,
+	.setup_flow_2 = SETUP_FLOW,
+	.use_client_2 = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "::ffff:10.0.0.2",
+	.server_port = 6672,
+};
+
+FIXTURE_VARIANT_ADD(quic_data, ipv6_mapped_ipv4_setup_ipv4_one_conn)
+{
+	.af_client_1 = AF_INET,
+	.client_1_address = "10.0.0.3",
+	.client_1_port = 6676,
+	.conn_id_1 = {0x11, 0x12, 0x13, 0x14},
+	.conn_id_1_len = 4,
+	.setup_flow_1 = SETUP_FLOW,
+	.use_client_1 = DO_NOT_USE_CLIENT,
+	.af_client_2 = AF_INET6,
+	.client_2_address = "::ffff:10.0.0.3",
+	.client_2_port = 6676,
+	.conn_id_2 = {0x11, 0x12, 0x13, 0x14},
+	.conn_id_2_len = 4,
+	.setup_flow_2 = DO_NOT_SETUP_FLOW,
+	.use_client_2 = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "::ffff:10.0.0.2",
+	.server_port = 6677,
+};
+
+FIXTURE_VARIANT_ADD(quic_data, ipv6_mapped_ipv4_setup_ipv6_one_conn)
+{
+	.af_client_1 = AF_INET6,
+	.client_1_address = "::ffff:10.0.0.3",
+	.client_1_port = 6678,
+	.conn_id_1 = {0x11, 0x12, 0x13, 0x14},
+	.setup_flow_1 = SETUP_FLOW,
+	.use_client_1 = DO_NOT_USE_CLIENT,
+	.af_client_2 = AF_INET,
+	.client_2_address = "10.0.0.3",
+	.client_2_port = 6678,
+	.conn_id_2 = {0x11, 0x12, 0x13, 0x14},
+	.setup_flow_2 = DO_NOT_SETUP_FLOW,
+	.use_client_2 = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "::ffff:10.0.0.2",
+	.server_port = 6679,
+};
+
+FIXTURE_SETUP(quic_data)
+{
+	char path[PATH_MAX];
+	int optval = 1;
+
+	if (variant->af_client_1 == AF_INET) {
+		self->len_c1 = sizeof(self->client_1.addr);
+		self->client_1.addr.sin_family = variant->af_client_1;
+		inet_pton(variant->af_client_1, variant->client_1_address,
+			  &self->client_1.addr.sin_addr);
+		self->client_1.addr.sin_port = htons(variant->client_1_port);
+	} else {
+		self->len_c1 = sizeof(self->client_1.addr6);
+		self->client_1.addr6.sin6_family = variant->af_client_1;
+		inet_pton(variant->af_client_1, variant->client_1_address,
+			  &self->client_1.addr6.sin6_addr);
+		self->client_1.addr6.sin6_port = htons(variant->client_1_port);
+	}
+
+	if (variant->af_client_2 == AF_INET) {
+		self->len_c2 = sizeof(self->client_2.addr);
+		self->client_2.addr.sin_family = variant->af_client_2;
+		inet_pton(variant->af_client_2, variant->client_2_address,
+			  &self->client_2.addr.sin_addr);
+		self->client_2.addr.sin_port = htons(variant->client_2_port);
+	} else {
+		self->len_c2 = sizeof(self->client_2.addr6);
+		self->client_2.addr6.sin6_family = variant->af_client_2;
+		inet_pton(variant->af_client_2, variant->client_2_address,
+			  &self->client_2.addr6.sin6_addr);
+		self->client_2.addr6.sin6_port = htons(variant->client_2_port);
+	}
+
+	if (variant->af_server == AF_INET) {
+		self->len_s = sizeof(self->server.addr);
+		self->server.addr.sin_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr.sin_addr);
+		self->server.addr.sin_port = htons(variant->server_port);
+	} else {
+		self->len_s = sizeof(self->server.addr6);
+		self->server.addr6.sin6_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr6.sin6_addr);
+		self->server.addr6.sin6_port = htons(variant->server_port);
+	}
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/net", getpid());
+	self->default_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->default_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns11");
+	self->client_1_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->client_1_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns12");
+	self->client_2_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->client_2_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns2");
+	self->server_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->server_net_ns_fd, 0);
+
+	if (variant->use_client_1) {
+		ASSERT_NE(setns(self->client_1_net_ns_fd, 0), -1);
+		self->c1fd = socket(variant->af_client_1, SOCK_DGRAM, 0);
+		ASSERT_NE(setsockopt(self->c1fd, SOL_SOCKET, SO_REUSEPORT,
+				     &optval, sizeof(optval)), -1);
+		if (variant->af_client_1 == AF_INET) {
+			ASSERT_EQ(bind(self->c1fd, &self->client_1.addr,
+				       self->len_c1), 0);
+			ASSERT_EQ(getsockname(self->c1fd, &self->client_1.addr,
+					      &self->len_c1), 0);
+		} else {
+			ASSERT_EQ(bind(self->c1fd, &self->client_1.addr6,
+				       self->len_c1), 0);
+			ASSERT_EQ(getsockname(self->c1fd, &self->client_1.addr6,
+					      &self->len_c1), 0);
+		}
+	}
+
+	if (variant->use_client_2) {
+		ASSERT_NE(setns(self->client_2_net_ns_fd, 0), -1);
+		self->c2fd = socket(variant->af_client_2, SOCK_DGRAM, 0);
+		ASSERT_NE(setsockopt(self->c2fd, SOL_SOCKET, SO_REUSEPORT,
+				     &optval, sizeof(optval)), -1);
+		if (variant->af_client_2 == AF_INET) {
+			ASSERT_EQ(bind(self->c2fd, &self->client_2.addr,
+				       self->len_c2), 0);
+			ASSERT_EQ(getsockname(self->c2fd, &self->client_2.addr,
+					      &self->len_c2), 0);
+		} else {
+			ASSERT_EQ(bind(self->c2fd, &self->client_2.addr6,
+				       self->len_c2), 0);
+			ASSERT_EQ(getsockname(self->c2fd, &self->client_2.addr6,
+					      &self->len_c2), 0);
+		}
+	}
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	self->sfd = socket(variant->af_server, SOCK_DGRAM, 0);
+	ASSERT_NE(setsockopt(self->sfd, SOL_SOCKET, SO_REUSEPORT, &optval,
+			     sizeof(optval)), -1);
+	if (variant->af_server == AF_INET) {
+		ASSERT_EQ(bind(self->sfd, &self->server.addr, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr,
+				      &self->len_s), 0);
+	} else {
+		ASSERT_EQ(bind(self->sfd, &self->server.addr6, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr6,
+				      &self->len_s), 0);
+	}
+
+	ASSERT_EQ(setsockopt(self->sfd, IPPROTO_UDP, UDP_ULP,
+			     "quic-crypto", sizeof("quic-crypto")), 0);
+
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+FIXTURE_TEARDOWN(quic_data)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	close(self->sfd);
+	ASSERT_NE(setns(self->client_1_net_ns_fd, 0), -1);
+	close(self->c1fd);
+	ASSERT_NE(setns(self->client_2_net_ns_fd, 0), -1);
+	close(self->c2fd);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+TEST_F(quic_data, send_fail_no_flow)
+{
+	char const *test_str = "test_read";
+	int send_len = 10;
+
+	ASSERT_EQ(strlen(test_str) + 1, send_len);
+	EXPECT_EQ(sendto(self->sfd, test_str, send_len, 0,
+			 &self->client_1.addr, self->len_c1), -1);
+};
+
+TEST_F(quic_data, encrypt_two_conn_gso_1200_iov_2_size_9000_aesgcm128)
+{
+	size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
+	uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)];
+	struct quic_connection_info conn_1_info;
+	struct quic_connection_info conn_2_info;
+	struct quic_tx_ancillary_data *anc_data;
+	socklen_t recv_addr_len_1;
+	socklen_t recv_addr_len_2;
+	struct cmsghdr *cmsg_hdr;
+	int frag_size = 1200;
+	int send_len = 9000;
+	struct iovec iov[2];
+	int msg_len = 4500;
+	struct msghdr msg;
+	char *test_str_1;
+	char *test_str_2;
+	char *buf_1;
+	char *buf_2;
+	int i;
+
+	test_str_1 = (char *)malloc(9000);
+	test_str_2 = (char *)malloc(9000);
+	memset(test_str_1, 0, 9000);
+	memset(test_str_2, 0, 9000);
+
+	buf_1 = (char *)malloc(10000);
+	buf_2 = (char *)malloc(10000);
+	for (i = 0; i < 9000; i += (1200 - 16)) {
+		test_str_1[i] = 0x40;
+		memcpy(&test_str_1[i + 1], &variant->conn_id_1,
+		       variant->conn_id_1_len);
+		test_str_1[i + 1 + variant->conn_id_1_len] = 0xca;
+
+		test_str_2[i] = 0x40;
+		memcpy(&test_str_2[i + 1], &variant->conn_id_2,
+		       variant->conn_id_2_len);
+		test_str_2[i + 1 + variant->conn_id_2_len] = 0xca;
+	}
+
+	// program the connection into the offload
+	conn_1_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	memset(&conn_1_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_1_info.key.conn_id_length = variant->conn_id_1_len;
+	memcpy(conn_1_info.key.conn_id,
+	       &variant->conn_id_1,
+	       variant->conn_id_1_len);
+
+	conn_2_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	memset(&conn_2_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_2_info.key.conn_id_length = variant->conn_id_2_len;
+	memcpy(conn_2_info.key.conn_id,
+	       &variant->conn_id_2,
+	       variant->conn_id_2_len);
+
+	memcpy(&conn_1_info.aes_gcm_128.payload_key,
+	       &variant->conn_1_key, 16);
+	memcpy(&conn_1_info.aes_gcm_128.payload_iv,
+	       &variant->conn_1_iv, 12);
+	memcpy(&conn_1_info.aes_gcm_128.header_key,
+	       &variant->conn_1_hdr_key, 16);
+	memcpy(&conn_2_info.aes_gcm_128.payload_key,
+	       &variant->conn_2_key, 16);
+	memcpy(&conn_2_info.aes_gcm_128.payload_iv,
+	       &variant->conn_2_iv, 12);
+	memcpy(&conn_2_info.aes_gcm_128.header_key,
+	       &variant->conn_2_hdr_key,
+	       16);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
+			     sizeof(frag_size)), 0);
+
+	if (variant->setup_flow_1)
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_ADD_TX_CONNECTION,
+				     &conn_1_info, sizeof(conn_1_info)), 0);
+
+	if (variant->setup_flow_2)
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_ADD_TX_CONNECTION,
+				     &conn_2_info, sizeof(conn_2_info)), 0);
+
+	recv_addr_len_1 = self->len_c1;
+	recv_addr_len_2 = self->len_c2;
+
+	iov[0].iov_base = test_str_1;
+	iov[0].iov_len = msg_len;
+	iov[1].iov_base = (void *)test_str_1 + 4500;
+	iov[1].iov_len = msg_len;
+
+	msg.msg_name = (self->client_1.addr.sin_family == AF_INET)
+		       ? (void *)&self->client_1.addr
+		       : (void *)&self->client_1.addr6;
+	msg.msg_namelen = self->len_c1;
+	msg.msg_iov = iov;
+	msg.msg_iovlen = 2;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->flags = 0;
+	anc_data->conn_id_length = variant->conn_id_1_len;
+
+	if (variant->use_client_1)
+		EXPECT_EQ(sendmsg(self->sfd, &msg, 0), send_len);
+
+	iov[0].iov_base = test_str_2;
+	iov[0].iov_len = msg_len;
+	iov[1].iov_base = (void *)test_str_2 + 4500;
+	iov[1].iov_len = msg_len;
+	msg.msg_name = (self->client_2.addr.sin_family == AF_INET)
+		       ? (void *)&self->client_2.addr
+		       : (void *)&self->client_2.addr6;
+	msg.msg_namelen = self->len_c2;
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->conn_id_length = variant->conn_id_2_len;
+	anc_data->flags = 0;
+
+	if (variant->use_client_2)
+		EXPECT_EQ(sendmsg(self->sfd, &msg, 0), send_len);
+
+	if (variant->use_client_1) {
+		ASSERT_NE(setns(self->client_1_net_ns_fd, 0), -1);
+		if (variant->af_client_1 == AF_INET) {
+			for (i = 0; i < 7; ++i) {
+				EXPECT_EQ(recvfrom(self->c1fd, buf_1, 9000, 0,
+						   &self->client_1.addr,
+						   &recv_addr_len_1),
+					  1200);
+				// Validate framing is intact.
+				EXPECT_EQ(memcmp((void *)buf_1 + 1,
+						 &variant->conn_id_1,
+						 variant->conn_id_1_len), 0);
+			}
+			EXPECT_EQ(recvfrom(self->c1fd, buf_1, 9000, 0,
+					   &self->client_1.addr,
+					   &recv_addr_len_1),
+				  728);
+			EXPECT_EQ(memcmp((void *)buf_1 + 1,
+					 &variant->conn_id_1,
+					 variant->conn_id_1_len), 0);
+		} else {
+			for (i = 0; i < 7; ++i) {
+				EXPECT_EQ(recvfrom(self->c1fd, buf_1, 9000, 0,
+						   &self->client_1.addr6,
+						   &recv_addr_len_1),
+					1200);
+			}
+			EXPECT_EQ(recvfrom(self->c1fd, buf_1, 9000, 0,
+					   &self->client_1.addr6,
+					   &recv_addr_len_1),
+				  728);
+			EXPECT_EQ(memcmp((void *)buf_1 + 1,
+					 &variant->conn_id_1,
+					 variant->conn_id_1_len), 0);
+		}
+		EXPECT_NE(memcmp(buf_1, test_str_1, send_len), 0);
+	}
+
+	if (variant->use_client_2) {
+		ASSERT_NE(setns(self->client_2_net_ns_fd, 0), -1);
+		if (variant->af_client_2 == AF_INET) {
+			for (i = 0; i < 7; ++i) {
+				EXPECT_EQ(recvfrom(self->c2fd, buf_2, 9000, 0,
+						   &self->client_2.addr,
+						   &recv_addr_len_2),
+					  1200);
+				EXPECT_EQ(memcmp((void *)buf_2 + 1,
+						 &variant->conn_id_2,
+						 variant->conn_id_2_len), 0);
+			}
+			EXPECT_EQ(recvfrom(self->c2fd, buf_2, 9000, 0,
+					   &self->client_2.addr,
+					   &recv_addr_len_2),
+				  728);
+			EXPECT_EQ(memcmp((void *)buf_2 + 1,
+					 &variant->conn_id_2,
+					 variant->conn_id_2_len), 0);
+		} else {
+			for (i = 0; i < 7; ++i) {
+				EXPECT_EQ(recvfrom(self->c2fd, buf_2, 9000, 0,
+						   &self->client_2.addr6,
+						   &recv_addr_len_2),
+					  1200);
+				EXPECT_EQ(memcmp((void *)buf_2 + 1,
+						 &variant->conn_id_2,
+						 variant->conn_id_2_len), 0);
+			}
+			EXPECT_EQ(recvfrom(self->c2fd, buf_2, 9000, 0,
+					   &self->client_2.addr6,
+					   &recv_addr_len_2),
+				  728);
+			EXPECT_EQ(memcmp((void *)buf_2 + 1,
+					 &variant->conn_id_2,
+					 variant->conn_id_2_len), 0);
+		}
+		EXPECT_NE(memcmp(buf_2, test_str_2, send_len), 0);
+	}
+
+	if (variant->use_client_1 && variant->use_client_2)
+		EXPECT_NE(memcmp(buf_1, buf_2, send_len), 0);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	if (variant->setup_flow_1) {
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_DEL_TX_CONNECTION,
+				     &conn_1_info, sizeof(conn_1_info)),
+			  0);
+	}
+	if (variant->setup_flow_2) {
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_DEL_TX_CONNECTION,
+				     &conn_2_info, sizeof(conn_2_info)),
+			  0);
+	}
+	free(test_str_1);
+	free(test_str_2);
+	free(buf_1);
+	free(buf_2);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+// 3. QUIC Encryption Tests
+
+FIXTURE(quic_crypto)
+{
+	int sfd, cfd;
+	socklen_t len_c;
+	socklen_t len_s;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} client;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} server;
+	int default_net_ns_fd;
+	int client_net_ns_fd;
+	int server_net_ns_fd;
+};
+
+FIXTURE_VARIANT(quic_crypto)
+{
+	unsigned int af_client;
+	char *client_address;
+	unsigned short client_port;
+	uint32_t algo;
+	size_t conn_key_len;
+	uint8_t conn_id[8];
+	union {
+		uint8_t conn_key_16[16];
+		uint8_t conn_key_32[32];
+	} conn_key;
+	uint8_t conn_iv[12];
+	union {
+		uint8_t conn_hdr_key_16[16];
+		uint8_t conn_hdr_key_32[32];
+	} conn_hdr_key;
+	size_t conn_id_len;
+	bool setup_flow;
+	bool use_client;
+	unsigned int af_server;
+	char *server_address;
+	unsigned short server_port;
+	char plain[128];
+	size_t plain_len;
+	char match[128];
+	size_t match_len;
+	uint32_t next_pkt_num;
+};
+
+FIXTURE_SETUP(quic_crypto)
+{
+	char path[PATH_MAX];
+	int optval = 1;
+
+	if (variant->af_client == AF_INET) {
+		self->len_c = sizeof(self->client.addr);
+		self->client.addr.sin_family = variant->af_client;
+		inet_pton(variant->af_client, variant->client_address,
+			  &self->client.addr.sin_addr);
+		self->client.addr.sin_port = htons(variant->client_port);
+	} else {
+		self->len_c = sizeof(self->client.addr6);
+		self->client.addr6.sin6_family = variant->af_client;
+		inet_pton(variant->af_client, variant->client_address,
+			  &self->client.addr6.sin6_addr);
+		self->client.addr6.sin6_port = htons(variant->client_port);
+	}
+
+	if (variant->af_server == AF_INET) {
+		self->len_s = sizeof(self->server.addr);
+		self->server.addr.sin_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr.sin_addr);
+		self->server.addr.sin_port = htons(variant->server_port);
+	} else {
+		self->len_s = sizeof(self->server.addr6);
+		self->server.addr6.sin6_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr6.sin6_addr);
+		self->server.addr6.sin6_port = htons(variant->server_port);
+	}
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/net", getpid());
+	self->default_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->default_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns11");
+	self->client_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->client_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns2");
+	self->server_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->server_net_ns_fd, 0);
+
+	if (variant->use_client) {
+		ASSERT_NE(setns(self->client_net_ns_fd, 0), -1);
+		self->cfd = socket(variant->af_client, SOCK_DGRAM, 0);
+		ASSERT_NE(setsockopt(self->cfd, SOL_SOCKET, SO_REUSEPORT,
+				     &optval, sizeof(optval)), -1);
+		if (variant->af_client == AF_INET) {
+			ASSERT_EQ(bind(self->cfd, &self->client.addr,
+				       self->len_c), 0);
+			ASSERT_EQ(getsockname(self->cfd, &self->client.addr,
+					      &self->len_c), 0);
+		} else {
+			ASSERT_EQ(bind(self->cfd, &self->client.addr6,
+				       self->len_c), 0);
+			ASSERT_EQ(getsockname(self->cfd, &self->client.addr6,
+					      &self->len_c), 0);
+		}
+	}
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	self->sfd = socket(variant->af_server, SOCK_DGRAM, 0);
+	ASSERT_NE(setsockopt(self->sfd, SOL_SOCKET, SO_REUSEPORT, &optval,
+			     sizeof(optval)), -1);
+	if (variant->af_server == AF_INET) {
+		ASSERT_EQ(bind(self->sfd, &self->server.addr, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr,
+				      &self->len_s),
+			  0);
+	} else {
+		ASSERT_EQ(bind(self->sfd, &self->server.addr6, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr6,
+				      &self->len_s),
+			  0);
+	}
+
+	ASSERT_EQ(setsockopt(self->sfd, IPPROTO_UDP, UDP_ULP,
+			     "quic-crypto", sizeof("quic-crypto")), 0);
+
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+FIXTURE_TEARDOWN(quic_crypto)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	close(self->sfd);
+	ASSERT_NE(setns(self->client_net_ns_fd, 0), -1);
+	close(self->cfd);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+FIXTURE_VARIANT_ADD(quic_crypto, ipv4_aes_gcm_128)
+{
+	.af_client = AF_INET,
+	.client_address = "10.0.0.1",
+	.client_port = 7667,
+	.algo = TLS_CIPHER_AES_GCM_128,
+	.conn_key_len = 16,
+	.conn_id = {0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12, 0x49},
+	.conn_key = {
+		.conn_key_16 = {0x87, 0x71, 0xea, 0x1d,
+				0xfb, 0xbe, 0x7a, 0x45,
+				0xbb, 0xe2, 0x7e, 0xbc,
+				0x0b, 0x53, 0x94, 0x99
+		},
+	},
+	.conn_iv = {0x3A, 0xA7, 0x46, 0x72, 0xE9, 0x83, 0x6B, 0x55, 0xDA,
+		0x66, 0x7B, 0xDA},
+	.conn_hdr_key = {
+		.conn_hdr_key_16 = {0xc9, 0x8e, 0xfd, 0xf2,
+				    0x0b, 0x64, 0x8c, 0x57,
+				    0xb5, 0x0a, 0xb2, 0xd2,
+				    0x21, 0xd3, 0x66, 0xa5},
+	},
+	.conn_id_len = 8,
+	.setup_flow = SETUP_FLOW,
+	.use_client = USE_CLIENT,
+	.af_server = AF_INET,
+	.server_address = "10.0.0.2",
+	.server_port = 7669,
+	.plain = { 0x40, 0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12,
+		   0x49, 0xca,
+		   // payload
+		   0x02, 0x80, 0xde, 0x40, 0x39, 0x40, 0xf6, 0x00,
+		   0x01, 0x0b, 0x00, 0x0f, 0x65, 0x63, 0x68, 0x6f,
+		   0x20, 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36,
+		   0x37, 0x38, 0x39
+	},
+	.plain_len = 37,
+	.match = {
+		   0x46, 0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12,
+		   0x49, 0x1c, 0x44, 0xb8, 0x41, 0xbb, 0xcf, 0x6e,
+		   0x0a, 0x2a, 0x24, 0xfb, 0xb4, 0x79, 0x62, 0xea,
+		   0x59, 0x38, 0x1a, 0x0e, 0x50, 0x1e, 0x59, 0xed,
+		   0x3f, 0x8e, 0x7e, 0x5a, 0x70, 0xe4, 0x2a, 0xbc,
+		   0x2a, 0xfa, 0x2b, 0x54, 0xeb, 0x89, 0xc3, 0x2c,
+		   0xb6, 0x8c, 0x1e, 0xab, 0x2d
+	},
+	.match_len = 53,
+	.next_pkt_num = 0x0d65c9,
+};
+
+FIXTURE_VARIANT_ADD(quic_crypto, ipv4_chacha20_poly1305)
+{
+	.af_client = AF_INET,
+	.client_address = "10.0.0.1",
+	.client_port = 7801,
+	.algo = TLS_CIPHER_CHACHA20_POLY1305,
+	.conn_key_len = 32,
+	.conn_id = {},
+	.conn_id_len = 0,
+	.conn_key = {
+		.conn_key_32 = {
+			0x3b, 0xfc, 0xdd, 0xd7, 0x2b, 0xcf, 0x02, 0x54,
+			0x1d, 0x7f, 0xa0, 0xdd, 0x1f, 0x5f, 0x9e, 0xee,
+			0xa8, 0x17, 0xe0, 0x9a, 0x69, 0x63, 0xa0, 0xe6,
+			0xc7, 0xdf, 0x0f, 0x9a, 0x1b, 0xab, 0x90, 0xf2,
+		},
+	},
+	.conn_iv = {
+		0xa6, 0xb5, 0xbc, 0x6a, 0xb7, 0xda, 0xfc, 0xe3,
+		0x0f, 0xff, 0xf5, 0xdd,
+	},
+	.conn_hdr_key = {
+		.conn_hdr_key_32 = {
+			0xd6, 0x59, 0x76, 0x0d, 0x2b, 0xa4, 0x34, 0xa2,
+			0x26, 0xfd, 0x37, 0xb3, 0x5c, 0x69, 0xe2, 0xda,
+			0x82, 0x11, 0xd1, 0x0c, 0x4f, 0x12, 0x53, 0x87,
+			0x87, 0xd6, 0x56, 0x45, 0xd5, 0xd1, 0xb8, 0xe2,
+		},
+	},
+	.setup_flow = SETUP_FLOW,
+	.use_client = USE_CLIENT,
+	.af_server = AF_INET,
+	.server_address = "10.0.0.2",
+	.server_port = 7802,
+	.plain = { 0x42, 0x00, 0xbf, 0xf4, 0x01 },
+	.plain_len = 5,
+	.match = { 0x55, 0x58, 0xb1, 0xc6, 0x0a, 0xe7, 0xb6, 0xb9,
+		   0x32, 0xbc, 0x27, 0xd7, 0x86, 0xf4, 0xbc, 0x2b,
+		   0xb2, 0x0f, 0x21, 0x62, 0xba },
+	.match_len = 21,
+	.next_pkt_num = 0x2700bff5,
+};
+
+FIXTURE_VARIANT_ADD(quic_crypto, ipv6_aes_gcm_128)
+{
+	.af_client = AF_INET6,
+	.client_address = "2001::1",
+	.client_port = 7673,
+	.algo = TLS_CIPHER_AES_GCM_128,
+	.conn_key_len = 16,
+	.conn_id = {0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12, 0x49},
+	.conn_key = {
+		.conn_key_16 = {0x87, 0x71, 0xea, 0x1d,
+				0xfb, 0xbe, 0x7a, 0x45,
+				0xbb, 0xe2, 0x7e, 0xbc,
+				0x0b, 0x53, 0x94, 0x99
+		},
+	},
+	.conn_iv = {0x3a, 0xa7, 0x46, 0x72, 0xe9, 0x83, 0x6b, 0x55, 0xda,
+		0x66, 0x7b, 0xda},
+	.conn_hdr_key = {
+		.conn_hdr_key_16 = {0xc9, 0x8e, 0xfd, 0xf2,
+				    0x0b, 0x64, 0x8c, 0x57,
+				    0xb5, 0x0a, 0xb2, 0xd2,
+				    0x21, 0xd3, 0x66, 0xa5},
+	},
+	.conn_id_len = 8,
+	.setup_flow = SETUP_FLOW,
+	.use_client = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "2001::2",
+	.server_port = 7675,
+	.plain = { 0x40, 0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12,
+		   0x49, 0xca,
+		   // Payload
+		   0x02, 0x80, 0xde, 0x40, 0x39, 0x40, 0xf6, 0x00,
+		   0x01, 0x0b, 0x00, 0x0f, 0x65, 0x63, 0x68, 0x6f,
+		   0x20, 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36,
+		   0x37, 0x38, 0x39
+	},
+	.plain_len = 37,
+	.match = {
+		   0x46, 0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12,
+		   0x49, 0x1c, 0x44, 0xb8, 0x41, 0xbb, 0xcf, 0x6e,
+		   0x0a, 0x2a, 0x24, 0xfb, 0xb4, 0x79, 0x62, 0xea,
+		   0x59, 0x38, 0x1a, 0x0e, 0x50, 0x1e, 0x59, 0xed,
+		   0x3f, 0x8e, 0x7e, 0x5a, 0x70, 0xe4, 0x2a, 0xbc,
+		   0x2a, 0xfa, 0x2b, 0x54, 0xeb, 0x89, 0xc3, 0x2c,
+		   0xb6, 0x8c, 0x1e, 0xab, 0x2d
+	},
+	.match_len = 53,
+	.next_pkt_num = 0x0d65c9,
+};
+
+FIXTURE_VARIANT_ADD(quic_crypto, ipv6_chacha20_poly1305)
+{
+	.af_client = AF_INET6,
+	.client_address = "2001::1",
+	.client_port = 7803,
+	.algo = TLS_CIPHER_CHACHA20_POLY1305,
+	.conn_key_len = 32,
+	.conn_id = {},
+	.conn_id_len = 0,
+	.conn_key = {
+		.conn_key_32 = {
+			0x3b, 0xfc, 0xdd, 0xd7, 0x2b, 0xcf, 0x02, 0x54,
+			0x1d, 0x7f, 0xa0, 0xdd, 0x1f, 0x5f, 0x9e, 0xee,
+			0xa8, 0x17, 0xe0, 0x9a, 0x69, 0x63, 0xa0, 0xe6,
+			0xc7, 0xdf, 0x0f, 0x9a, 0x1b, 0xab, 0x90, 0xf2,
+		},
+	},
+	.conn_iv = {
+		0xa6, 0xb5, 0xbc, 0x6a, 0xb7, 0xda, 0xfc, 0xe3,
+		0x0f, 0xff, 0xf5, 0xdd,
+	},
+	.conn_hdr_key = {
+		.conn_hdr_key_32 = {
+			0xd6, 0x59, 0x76, 0x0d, 0x2b, 0xa4, 0x34, 0xa2,
+			0x26, 0xfd, 0x37, 0xb3, 0x5c, 0x69, 0xe2, 0xda,
+			0x82, 0x11, 0xd1, 0x0c, 0x4f, 0x12, 0x53, 0x87,
+			0x87, 0xd6, 0x56, 0x45, 0xd5, 0xd1, 0xb8, 0xe2,
+		},
+	},
+	.setup_flow = SETUP_FLOW,
+	.use_client = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "2001::2",
+	.server_port = 7804,
+	.plain = { 0x42, 0x00, 0xbf, 0xf4, 0x01 },
+	.plain_len = 5,
+	.match = { 0x55, 0x58, 0xb1, 0xc6, 0x0a, 0xe7, 0xb6, 0xb9,
+		   0x32, 0xbc, 0x27, 0xd7, 0x86, 0xf4, 0xbc, 0x2b,
+		   0xb2, 0x0f, 0x21, 0x62, 0xba },
+	.match_len = 21,
+	.next_pkt_num = 0x2700bff5,
+};
+
+TEST_F(quic_crypto, encrypt_test_vector_single_flow_gso_in_control)
+{
+	uint8_t cmsg_buf[CMSG_SPACE(sizeof(struct quic_tx_ancillary_data))
+			 + CMSG_SPACE(sizeof(uint16_t))];
+	struct quic_tx_ancillary_data *anc_data;
+	struct quic_connection_info conn_info;
+	uint16_t frag_size = 1200;
+	struct cmsghdr *cmsg_hdr;
+	int wrong_frag_size = 26;
+	socklen_t recv_addr_len;
+	struct iovec iov;
+	struct msghdr msg;
+	char *buf;
+
+	buf = (char *)malloc(9000);
+	conn_info.cipher_type = variant->algo;
+	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_info.key.conn_id_length = variant->conn_id_len;
+	memcpy(conn_info.key.conn_id,
+	       &variant->conn_id,
+	       variant->conn_id_len);
+	ASSERT_TRUE(variant->algo == TLS_CIPHER_AES_GCM_128 ||
+		    variant->algo == TLS_CIPHER_CHACHA20_POLY1305);
+	switch (variant->algo) {
+	case TLS_CIPHER_AES_GCM_128:
+		memcpy(&conn_info.aes_gcm_128.payload_key,
+		       &variant->conn_key, 16);
+		memcpy(&conn_info.aes_gcm_128.payload_iv,
+		       &variant->conn_iv, 12);
+		memcpy(&conn_info.aes_gcm_128.header_key,
+		       &variant->conn_hdr_key, 16);
+		break;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		memcpy(&conn_info.chacha20_poly1305.payload_key,
+		       &variant->conn_key, 32);
+		memcpy(&conn_info.chacha20_poly1305.payload_iv,
+		       &variant->conn_iv, 12);
+		memcpy(&conn_info.chacha20_poly1305.header_key,
+		       &variant->conn_hdr_key, 32);
+		break;
+	}
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &wrong_frag_size,
+			     sizeof(wrong_frag_size)), 0);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION,
+			     &conn_info, sizeof(conn_info)), 0);
+
+	recv_addr_len = self->len_c;
+	iov.iov_base = (void *)variant->plain;
+	iov.iov_len = variant->plain_len;
+	memset(cmsg_buf, 0, sizeof(cmsg_buf));
+	msg.msg_name = (self->client.addr.sin_family == AF_INET)
+		       ? (void *)&self->client.addr
+		       : (void *)&self->client.addr6;
+	msg.msg_namelen = self->len_c;
+	msg.msg_iov = &iov;
+	msg.msg_iovlen = 1;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(sizeof(struct quic_tx_ancillary_data));
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->flags = 0;
+	anc_data->next_pkt_num = variant->next_pkt_num;
+	anc_data->conn_id_length = variant->conn_id_len;
+	cmsg_hdr = CMSG_NXTHDR(&msg, cmsg_hdr);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_SEGMENT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(sizeof(uint16_t));
+	memcpy(CMSG_DATA(cmsg_hdr), (void *)&frag_size, sizeof(frag_size));
+
+	EXPECT_EQ(sendmsg(self->sfd, &msg, 0), variant->plain_len);
+	ASSERT_NE(setns(self->client_net_ns_fd, 0), -1);
+	if (variant->af_client == AF_INET) {
+		EXPECT_EQ(recvfrom(self->cfd, buf, 9000, 0,
+				   &self->client.addr, &recv_addr_len),
+			  variant->match_len);
+	} else {
+		EXPECT_EQ(recvfrom(self->cfd, buf, 9000, 0,
+				   &self->client.addr6, &recv_addr_len),
+			  variant->match_len);
+	}
+	EXPECT_STREQ(buf, variant->match);
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION,
+			     &conn_info, sizeof(conn_info)), 0);
+	free(buf);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+TEST_F(quic_crypto, encrypt_test_vector_single_flow_gso_in_setsockopt)
+{
+	uint8_t cmsg_buf[CMSG_SPACE(sizeof(struct quic_tx_ancillary_data))];
+	struct quic_tx_ancillary_data *anc_data;
+	struct quic_connection_info conn_info;
+	int frag_size = 1200;
+	struct cmsghdr *cmsg_hdr;
+	socklen_t recv_addr_len;
+	struct iovec iov;
+	struct msghdr msg;
+	char *buf;
+
+	buf = (char *)malloc(9000);
+	conn_info.cipher_type = variant->algo;
+	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_info.key.conn_id_length = variant->conn_id_len;
+	memcpy(conn_info.key.conn_id,
+	       &variant->conn_id,
+	       variant->conn_id_len);
+	ASSERT_TRUE(variant->algo == TLS_CIPHER_AES_GCM_128 ||
+		    variant->algo == TLS_CIPHER_CHACHA20_POLY1305);
+	switch (variant->algo) {
+	case TLS_CIPHER_AES_GCM_128:
+		memcpy(&conn_info.aes_gcm_128.payload_key,
+		       &variant->conn_key, 16);
+		memcpy(&conn_info.aes_gcm_128.payload_iv,
+		       &variant->conn_iv, 12);
+		memcpy(&conn_info.aes_gcm_128.header_key,
+		       &variant->conn_hdr_key, 16);
+		break;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		memcpy(&conn_info.chacha20_poly1305.payload_key,
+		       &variant->conn_key, 32);
+		memcpy(&conn_info.chacha20_poly1305.payload_iv,
+		       &variant->conn_iv, 12);
+		memcpy(&conn_info.chacha20_poly1305.header_key,
+		       &variant->conn_hdr_key, 32);
+		break;
+	}
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
+			     sizeof(frag_size)), 0);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION,
+			     &conn_info, sizeof(conn_info)), 0);
+
+	recv_addr_len = self->len_c;
+	iov.iov_base = (void *)variant->plain;
+	iov.iov_len = variant->plain_len;
+	memset(cmsg_buf, 0, sizeof(cmsg_buf));
+	msg.msg_name = (self->client.addr.sin_family == AF_INET)
+		       ? (void *)&self->client.addr
+		       : (void *)&self->client.addr6;
+	msg.msg_namelen = self->len_c;
+	msg.msg_iov = &iov;
+	msg.msg_iovlen = 1;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(sizeof(struct quic_tx_ancillary_data));
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->flags = 0;
+	anc_data->next_pkt_num = variant->next_pkt_num;
+	anc_data->conn_id_length = variant->conn_id_len;
+
+	EXPECT_EQ(sendmsg(self->sfd, &msg, 0), variant->plain_len);
+	ASSERT_NE(setns(self->client_net_ns_fd, 0), -1);
+	if (variant->af_client == AF_INET) {
+		EXPECT_EQ(recvfrom(self->cfd, buf, 9000, 0,
+				   &self->client.addr, &recv_addr_len),
+			  variant->match_len);
+	} else {
+		EXPECT_EQ(recvfrom(self->cfd, buf, 9000, 0,
+				   &self->client.addr6, &recv_addr_len),
+			  variant->match_len);
+	}
+	EXPECT_STREQ(buf, variant->match);
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION,
+			     &conn_info, sizeof(conn_info)), 0);
+	free(buf);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/net/quic.sh b/tools/testing/selftests/net/quic.sh
new file mode 100755
index 000000000000..8ff8bc494671
--- /dev/null
+++ b/tools/testing/selftests/net/quic.sh
@@ -0,0 +1,46 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+sudo ip netns add ns11
+sudo ip netns add ns12
+sudo ip netns add ns2
+sudo ip link add veth11 type veth peer name br-veth11
+sudo ip link add veth12 type veth peer name br-veth12
+sudo ip link add veth2 type veth peer name br-veth2
+sudo ip link set veth11 netns ns11
+sudo ip link set veth12 netns ns12
+sudo ip link set veth2 netns ns2
+sudo ip netns exec ns11 ip addr add 10.0.0.1/24 dev veth11
+sudo ip netns exec ns11 ip addr add ::ffff:10.0.0.1/96 dev veth11
+sudo ip netns exec ns11 ip addr add 2001::1/64 dev veth11
+sudo ip netns exec ns12 ip addr add 10.0.0.3/24 dev veth12
+sudo ip netns exec ns12 ip addr add ::ffff:10.0.0.3/96 dev veth12
+sudo ip netns exec ns12 ip addr add 2001::3/64 dev veth12
+sudo ip netns exec ns2 ip addr add 10.0.0.2/24 dev veth2
+sudo ip netns exec ns2 ip addr add ::ffff:10.0.0.2/96 dev veth2
+sudo ip netns exec ns2 ip addr add 2001::2/64 dev veth2
+sudo ip link add name br1 type bridge forward_delay 0
+sudo ip link set br1 up
+sudo ip link set br-veth11 up
+sudo ip link set br-veth12 up
+sudo ip link set br-veth2 up
+sudo ip netns exec ns11 ip link set veth11 up
+sudo ip netns exec ns12 ip link set veth12 up
+sudo ip netns exec ns2 ip link set veth2 up
+sudo ip link set br-veth11 master br1
+sudo ip link set br-veth12 master br1
+sudo ip link set br-veth2 master br1
+sudo ip netns exec ns2 cat /proc/net/quic_stat
+
+printf "%s" "Waiting for bridge to start fowarding ..."
+while ! timeout 0.5 sudo ip netns exec ns2 ping -c 1 -n 2001::1 &> /dev/null
+do
+	printf "%c" "."
+done
+printf "\n%s\n"  "Bridge is operational"
+
+sudo ./quic
+sudo ip netns exec ns2 cat /proc/net/quic_stat
+sudo ip netns delete ns2
+sudo ip netns delete ns12
+sudo ip netns delete ns11
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [net-next v2 0/6] net: support QUIC crypto
  2022-08-17 20:09 ` [net-next v2 " Adel Abouchaev
                     ` (5 preceding siblings ...)
  2022-08-17 20:09   ` [net-next v2 6/6] Add self tests for ULP operations, flow setup and crypto tests Adel Abouchaev
@ 2022-08-18  2:18   ` Bagas Sanjaya
  2022-08-24 18:29   ` Xin Long
  7 siblings, 0 replies; 77+ messages in thread
From: Bagas Sanjaya @ 2022-08-18  2:18 UTC (permalink / raw)
  To: Adel Abouchaev, kuba
  Cc: davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest

On 8/18/22 03:09, Adel Abouchaev wrote:
> QUIC requires end to end encryption of the data. The application usually
> prepares the data in clear text, encrypts and calls send() which implies
> multiple copies of the data before the packets hit the networking stack.
> Similar to kTLS, QUIC kernel offload of cryptography reduces the memory
> pressure by reducing the number of copies.
> 
> The scope of kernel support is limited to the symmetric cryptography,
> leaving the handshake to the user space library. For QUIC in particular,
> the application packets that require symmetric cryptography are the 1RTT
> packets with short headers. Kernel will encrypt the application packets
> on transmission and decrypt on receive. This series implements Tx only,
> because in QUIC server applications Tx outweighs Rx by orders of
> magnitude.
> 
> Supporting the combination of QUIC and GSO requires the application to
> correctly place the data and the kernel to correctly slice it. The
> encryption process appends an arbitrary number of bytes (tag) to the end
> of the message to authenticate it. The GSO value should include this
> overhead, the offload would then subtract the tag size to parse the
> input on Tx before chunking and encrypting it.
> 
> With the kernel cryptography, the buffer copy operation is conjoined
> with the encryption operation. The memory bandwidth is reduced by 5-8%.
> When devices supporting QUIC encryption in hardware come to the market,
> we will be able to free further 7% of CPU utilization which is used
> today for crypto operations.
> 
> Adel Abouchaev (6):
>   Documentation on QUIC kernel Tx crypto.
>   Define QUIC specific constants, control and data plane structures
>   Add UDP ULP operations, initialization and handling prototype
>     functions.
>   Implement QUIC offload functions
>   Add flow counters and Tx processing error counter
>   Add self tests for ULP operations, flow setup and crypto tests
> 
>  Documentation/networking/index.rst     |    1 +
>  Documentation/networking/quic.rst      |  185 ++++
>  include/net/inet_sock.h                |    2 +
>  include/net/netns/mib.h                |    3 +
>  include/net/quic.h                     |   63 ++
>  include/net/snmp.h                     |    6 +
>  include/net/udp.h                      |   33 +
>  include/uapi/linux/quic.h              |   60 +
>  include/uapi/linux/snmp.h              |    9 +
>  include/uapi/linux/udp.h               |    4 +
>  net/Kconfig                            |    1 +
>  net/Makefile                           |    1 +
>  net/ipv4/Makefile                      |    3 +-
>  net/ipv4/udp.c                         |   15 +
>  net/ipv4/udp_ulp.c                     |  192 ++++
>  net/quic/Kconfig                       |   16 +
>  net/quic/Makefile                      |    8 +
>  net/quic/quic_main.c                   | 1417 ++++++++++++++++++++++++
>  net/quic/quic_proc.c                   |   45 +
>  tools/testing/selftests/net/.gitignore |    4 +-
>  tools/testing/selftests/net/Makefile   |    3 +-
>  tools/testing/selftests/net/quic.c     | 1153 +++++++++++++++++++
>  tools/testing/selftests/net/quic.sh    |   46 +
>  23 files changed, 3267 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/networking/quic.rst
>  create mode 100644 include/net/quic.h
>  create mode 100644 include/uapi/linux/quic.h
>  create mode 100644 net/ipv4/udp_ulp.c
>  create mode 100644 net/quic/Kconfig
>  create mode 100644 net/quic/Makefile
>  create mode 100644 net/quic/quic_main.c
>  create mode 100644 net/quic/quic_proc.c
>  create mode 100644 tools/testing/selftests/net/quic.c
>  create mode 100755 tools/testing/selftests/net/quic.sh
> 
> 
> base-commit: fd78d07c7c35de260eb89f1be4a1e7487b8092ad

Applied, but based on f86d1fbbe78588 ("Merge tag 'net-next-6.0' of
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next") instead,
since this series fails to apply on the specified base-commit tag.

Thanks.

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [net-next v2 1/6] Documentation on QUIC kernel Tx crypto.
  2022-08-17 20:09   ` [net-next v2 1/6] Documentation on QUIC kernel Tx crypto Adel Abouchaev
@ 2022-08-18  2:53     ` Bagas Sanjaya
  0 siblings, 0 replies; 77+ messages in thread
From: Bagas Sanjaya @ 2022-08-18  2:53 UTC (permalink / raw)
  To: Adel Abouchaev
  Cc: kuba, davem, edumazet, pabeni, corbet, dsahern, shuah, imagedong,
	netdev, linux-doc, linux-kselftest, kernel test robot

[-- Attachment #1: Type: text/plain, Size: 254 bytes --]

On Wed, Aug 17, 2022 at 01:09:35PM -0700, Adel Abouchaev wrote:
> Add documentation for kernel QUIC code.
> 

The documentation LGTM.

Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [net-next v2 0/6] net: support QUIC crypto
  2022-08-17 20:09 ` [net-next v2 " Adel Abouchaev
                     ` (6 preceding siblings ...)
  2022-08-18  2:18   ` [net-next v2 0/6] net: support QUIC crypto Bagas Sanjaya
@ 2022-08-24 18:29   ` Xin Long
  2022-08-24 19:52     ` Matt Joras
  2022-08-24 23:09     ` Adel Abouchaev
  7 siblings, 2 replies; 77+ messages in thread
From: Xin Long @ 2022-08-24 18:29 UTC (permalink / raw)
  To: Adel Abouchaev
  Cc: Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Jonathan Corbet, David Ahern, shuah, imagedong, network dev,
	linux-doc, linux-kselftest

On Wed, Aug 17, 2022 at 4:11 PM Adel Abouchaev <adel.abushaev@gmail.com> wrote:
>
> QUIC requires end to end encryption of the data. The application usually
> prepares the data in clear text, encrypts and calls send() which implies
> multiple copies of the data before the packets hit the networking stack.
> Similar to kTLS, QUIC kernel offload of cryptography reduces the memory
> pressure by reducing the number of copies.
>
> The scope of kernel support is limited to the symmetric cryptography,
> leaving the handshake to the user space library. For QUIC in particular,
> the application packets that require symmetric cryptography are the 1RTT
> packets with short headers. Kernel will encrypt the application packets
> on transmission and decrypt on receive. This series implements Tx only,
> because in QUIC server applications Tx outweighs Rx by orders of
> magnitude.
>
> Supporting the combination of QUIC and GSO requires the application to
> correctly place the data and the kernel to correctly slice it. The
> encryption process appends an arbitrary number of bytes (tag) to the end
> of the message to authenticate it. The GSO value should include this
> overhead, the offload would then subtract the tag size to parse the
> input on Tx before chunking and encrypting it.
>
> With the kernel cryptography, the buffer copy operation is conjoined
> with the encryption operation. The memory bandwidth is reduced by 5-8%.
> When devices supporting QUIC encryption in hardware come to the market,
> we will be able to free further 7% of CPU utilization which is used
> today for crypto operations.
>
> Adel Abouchaev (6):
>   Documentation on QUIC kernel Tx crypto.
>   Define QUIC specific constants, control and data plane structures
>   Add UDP ULP operations, initialization and handling prototype
>     functions.
>   Implement QUIC offload functions
>   Add flow counters and Tx processing error counter
>   Add self tests for ULP operations, flow setup and crypto tests
>
>  Documentation/networking/index.rst     |    1 +
>  Documentation/networking/quic.rst      |  185 ++++
>  include/net/inet_sock.h                |    2 +
>  include/net/netns/mib.h                |    3 +
>  include/net/quic.h                     |   63 ++
>  include/net/snmp.h                     |    6 +
>  include/net/udp.h                      |   33 +
>  include/uapi/linux/quic.h              |   60 +
>  include/uapi/linux/snmp.h              |    9 +
>  include/uapi/linux/udp.h               |    4 +
>  net/Kconfig                            |    1 +
>  net/Makefile                           |    1 +
>  net/ipv4/Makefile                      |    3 +-
>  net/ipv4/udp.c                         |   15 +
>  net/ipv4/udp_ulp.c                     |  192 ++++
>  net/quic/Kconfig                       |   16 +
>  net/quic/Makefile                      |    8 +
>  net/quic/quic_main.c                   | 1417 ++++++++++++++++++++++++
>  net/quic/quic_proc.c                   |   45 +
>  tools/testing/selftests/net/.gitignore |    4 +-
>  tools/testing/selftests/net/Makefile   |    3 +-
>  tools/testing/selftests/net/quic.c     | 1153 +++++++++++++++++++
>  tools/testing/selftests/net/quic.sh    |   46 +
>  23 files changed, 3267 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/networking/quic.rst
>  create mode 100644 include/net/quic.h
>  create mode 100644 include/uapi/linux/quic.h
>  create mode 100644 net/ipv4/udp_ulp.c
>  create mode 100644 net/quic/Kconfig
>  create mode 100644 net/quic/Makefile
>  create mode 100644 net/quic/quic_main.c
>  create mode 100644 net/quic/quic_proc.c
>  create mode 100644 tools/testing/selftests/net/quic.c
>  create mode 100755 tools/testing/selftests/net/quic.sh
>
>
> base-commit: fd78d07c7c35de260eb89f1be4a1e7487b8092ad
> --
> 2.30.2
>
Hi, Adel,

I don't see how the key update(rfc9001#section-6) is handled on the TX
path, which is not using TLS Key update, and "Key Phase" indicates
which key will be used after rekeying. Also, I think it is almost
impossible to handle the peer rekeying on the RX path either based on
your current model in the future.

The patch seems to get the crypto_ctx by doing a connection hash table
lookup in the sendmsg(), which is not good from the performance side.
One QUIC connection can go over multiple UDP sockets, but I don't
think one socket can be used by multiple QUIC connections. So why not
save the ctx in the socket instead?

The patch is to reduce the copying operations between user space and
the kernel. I might miss something in your user space code, but the
msg to send is *already packed* into the Stream Frame in user space,
what's the difference if you encrypt it in userspace and then
sendmsg(udp_sk) with zero-copy to the kernel.

Didn't really understand the "GSO" you mentioned, as I don't see any
code about kernel GSO, I guess it's just "Fragment size", right?
BTW, it‘s not common to use "//" for the kernel annotation.

I'm not sure if it's worth adding a ULP layer over UDP for this QUIC
TX only. Honestly, I'm more supporting doing a full QUIC stack in the
kernel independently with socket APIs to use it:
https://github.com/lxin/tls_hs.

Thanks.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [net-next] Fix reinitialization of TEST_PROGS in net self tests.
       [not found] <Adel Abouchaev <adel.abushaev@gmail.com>
                   ` (4 preceding siblings ...)
  2022-08-17 20:09 ` [net-next v2 " Adel Abouchaev
@ 2022-08-24 18:43 ` Adel Abouchaev
  2022-08-24 20:12   ` Shuah Khan
  2022-08-25 20:30   ` patchwork-bot+netdevbpf
  2022-09-07  0:49 ` [net-next v3 0/6] net: support QUIC crypto Adel Abouchaev
  2022-09-09  0:12 ` [net-next v4 0/6] net: support QUIC crypto Adel Abouchaev
  7 siblings, 2 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-24 18:43 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, shuah, netdev, linux-kselftest

Fix reinitialization of TEST_PROGS in net self tests.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>
---
 tools/testing/selftests/net/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 11a288b67e2f..4a5978eab848 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -42,7 +42,7 @@ TEST_PROGS += arp_ndisc_evict_nocarrier.sh
 TEST_PROGS += ndisc_unsolicited_na_test.sh
 TEST_PROGS += arp_ndisc_untracked_subnets.sh
 TEST_PROGS += stress_reuseport_listen.sh
-TEST_PROGS := l2_tos_ttl_inherit.sh
+TEST_PROGS += l2_tos_ttl_inherit.sh
 TEST_PROGS_EXTENDED := in_netns.sh setup_loopback.sh setup_veth.sh
 TEST_PROGS_EXTENDED += toeplitz_client.sh toeplitz.sh
 TEST_GEN_FILES =  socket nettest
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [net-next v2 0/6] net: support QUIC crypto
  2022-08-24 18:29   ` Xin Long
@ 2022-08-24 19:52     ` Matt Joras
  2022-08-24 23:09     ` Adel Abouchaev
  1 sibling, 0 replies; 77+ messages in thread
From: Matt Joras @ 2022-08-24 19:52 UTC (permalink / raw)
  To: Xin Long
  Cc: Adel Abouchaev, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Jonathan Corbet, David Ahern, shuah, imagedong, network dev,
	linux-doc, linux-kselftest


> On Aug 24, 2022, at 11:29 AM, Xin Long <lucien.xin@gmail.com> wrote:
> 
> On Wed, Aug 17, 2022 at 4:11 PM Adel Abouchaev <adel.abushaev@gmail.com> wrote:
>> 
>> QUIC requires end to end encryption of the data. The application usually
>> prepares the data in clear text, encrypts and calls send() which implies
>> multiple copies of the data before the packets hit the networking stack.
>> Similar to kTLS, QUIC kernel offload of cryptography reduces the memory
>> pressure by reducing the number of copies.
>> 
>> The scope of kernel support is limited to the symmetric cryptography,
>> leaving the handshake to the user space library. For QUIC in particular,
>> the application packets that require symmetric cryptography are the 1RTT
>> packets with short headers. Kernel will encrypt the application packets
>> on transmission and decrypt on receive. This series implements Tx only,
>> because in QUIC server applications Tx outweighs Rx by orders of
>> magnitude.
>> 
>> Supporting the combination of QUIC and GSO requires the application to
>> correctly place the data and the kernel to correctly slice it. The
>> encryption process appends an arbitrary number of bytes (tag) to the end
>> of the message to authenticate it. The GSO value should include this
>> overhead, the offload would then subtract the tag size to parse the
>> input on Tx before chunking and encrypting it.
>> 
>> With the kernel cryptography, the buffer copy operation is conjoined
>> with the encryption operation. The memory bandwidth is reduced by 5-8%.
>> When devices supporting QUIC encryption in hardware come to the market,
>> we will be able to free further 7% of CPU utilization which is used
>> today for crypto operations.
>> 
>> Adel Abouchaev (6):
>>  Documentation on QUIC kernel Tx crypto.
>>  Define QUIC specific constants, control and data plane structures
>>  Add UDP ULP operations, initialization and handling prototype
>>    functions.
>>  Implement QUIC offload functions
>>  Add flow counters and Tx processing error counter
>>  Add self tests for ULP operations, flow setup and crypto tests
>> 
>> Documentation/networking/index.rst     |    1 +
>> Documentation/networking/quic.rst      |  185 ++++
>> include/net/inet_sock.h                |    2 +
>> include/net/netns/mib.h                |    3 +
>> include/net/quic.h                     |   63 ++
>> include/net/snmp.h                     |    6 +
>> include/net/udp.h                      |   33 +
>> include/uapi/linux/quic.h              |   60 +
>> include/uapi/linux/snmp.h              |    9 +
>> include/uapi/linux/udp.h               |    4 +
>> net/Kconfig                            |    1 +
>> net/Makefile                           |    1 +
>> net/ipv4/Makefile                      |    3 +-
>> net/ipv4/udp.c                         |   15 +
>> net/ipv4/udp_ulp.c                     |  192 ++++
>> net/quic/Kconfig                       |   16 +
>> net/quic/Makefile                      |    8 +
>> net/quic/quic_main.c                   | 1417 ++++++++++++++++++++++++
>> net/quic/quic_proc.c                   |   45 +
>> tools/testing/selftests/net/.gitignore |    4 +-
>> tools/testing/selftests/net/Makefile   |    3 +-
>> tools/testing/selftests/net/quic.c     | 1153 +++++++++++++++++++
>> tools/testing/selftests/net/quic.sh    |   46 +
>> 23 files changed, 3267 insertions(+), 3 deletions(-)
>> create mode 100644 Documentation/networking/quic.rst
>> create mode 100644 include/net/quic.h
>> create mode 100644 include/uapi/linux/quic.h
>> create mode 100644 net/ipv4/udp_ulp.c
>> create mode 100644 net/quic/Kconfig
>> create mode 100644 net/quic/Makefile
>> create mode 100644 net/quic/quic_main.c
>> create mode 100644 net/quic/quic_proc.c
>> create mode 100644 tools/testing/selftests/net/quic.c
>> create mode 100755 tools/testing/selftests/net/quic.sh
>> 
>> 
>> base-commit: fd78d07c7c35de260eb89f1be4a1e7487b8092ad
>> --
>> 2.30.2
>> 
> Hi, Adel,
> 
> I don't see how the key update(rfc9001#section-6) is handled on the TX
> path, which is not using TLS Key update, and "Key Phase" indicates
> which key will be used after rekeying. Also, I think it is almost
> impossible to handle the peer rekeying on the RX path either based on
> your current model in the future.
Key updates are not something that needs to be handled by the kernel in this
model. I.e. a key update will be processed as normal by the userspace QUIC code and
the sockets will have to be re-associated with the new keying material.
> 
> The patch seems to get the crypto_ctx by doing a connection hash table
> lookup in the sendmsg(), which is not good from the performance side.
> One QUIC connection can go over multiple UDP sockets, but I don't
> think one socket can be used by multiple QUIC connections. So why not
> save the ctx in the socket instead?
There’s nothing preventing a single socket or UDP/IP tuple from being used
by multiple QUIC connections. This is achievable due to both endpoints having
CIDs. Note that it is not uncommon for QUIC deployments to use a single socket for
all connections, rather than the TCP listen/accept model. That being said, it
would be nice to be able to avoid the lookup cost when using a connected socket.

> 
> The patch is to reduce the copying operations between user space and
> the kernel. I might miss something in your user space code, but the
> msg to send is *already packed* into the Stream Frame in user space,
> what's the difference if you encrypt it in userspace and then
> sendmsg(udp_sk) with zero-copy to the kernel.
I would not say that reducing copy operations is the primary goal of this
work. There are already ways to achieve minimal copy operations for UDP from
userspace. 
> 
> Didn't really understand the "GSO" you mentioned, as I don't see any
> code about kernel GSO, I guess it's just "Fragment size", right?
> BTW, it‘s not common to use "//" for the kernel annotation.
> 
> I'm not sure if it's worth adding a ULP layer over UDP for this QUIC
> TX only. Honestly, I'm more supporting doing a full QUIC stack in the
> kernel independently with socket APIs to use it:
> https://github.com/lxin/tls_hs.
A full QUIC stack in the kernel with associated socket APIs is solving a
different problem than this work. Having an API to offload crypto operations of QUIC
allows for the choice of many different QUIC implementations in userspace while
potentially taking advantage of offloading the main CPU cost of an encrypted protocol.
> 
> Thanks.
> 

Best,
Matt Joras

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [net-next] Fix reinitialization of TEST_PROGS in net self tests.
  2022-08-24 18:43 ` [net-next] Fix reinitialization of TEST_PROGS in net self tests Adel Abouchaev
@ 2022-08-24 20:12   ` Shuah Khan
  2022-08-25 20:30   ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 77+ messages in thread
From: Shuah Khan @ 2022-08-24 20:12 UTC (permalink / raw)
  To: Adel Abouchaev, davem, edumazet, kuba, pabeni, shuah, netdev,
	linux-kselftest, Shuah Khan

On 8/24/22 12:43 PM, Adel Abouchaev wrote:
> Fix reinitialization of TEST_PROGS in net self tests.
> 
> Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>
> ---
>   tools/testing/selftests/net/Makefile | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
> index 11a288b67e2f..4a5978eab848 100644
> --- a/tools/testing/selftests/net/Makefile
> +++ b/tools/testing/selftests/net/Makefile
> @@ -42,7 +42,7 @@ TEST_PROGS += arp_ndisc_evict_nocarrier.sh
>   TEST_PROGS += ndisc_unsolicited_na_test.sh
>   TEST_PROGS += arp_ndisc_untracked_subnets.sh
>   TEST_PROGS += stress_reuseport_listen.sh
> -TEST_PROGS := l2_tos_ttl_inherit.sh
> +TEST_PROGS += l2_tos_ttl_inherit.sh
>   TEST_PROGS_EXTENDED := in_netns.sh setup_loopback.sh setup_veth.sh
>   TEST_PROGS_EXTENDED += toeplitz_client.sh toeplitz.sh
>   TEST_GEN_FILES =  socket nettest
> 

Thank you for fixing this. I am seeing these types of bugs recently
Have to careful with := vs +=

Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [net-next v2 0/6] net: support QUIC crypto
  2022-08-24 18:29   ` Xin Long
  2022-08-24 19:52     ` Matt Joras
@ 2022-08-24 23:09     ` Adel Abouchaev
  2022-09-25 18:04       ` Willem de Bruijn
  1 sibling, 1 reply; 77+ messages in thread
From: Adel Abouchaev @ 2022-08-24 23:09 UTC (permalink / raw)
  To: Xin Long
  Cc: Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Jonathan Corbet, David Ahern, shuah, imagedong, network dev,
	linux-doc, linux-kselftest


On 8/24/22 11:29 AM, Xin Long wrote:
> On Wed, Aug 17, 2022 at 4:11 PM Adel Abouchaev <adel.abushaev@gmail.com> wrote:
>> QUIC requires end to end encryption of the data. The application usually
>> prepares the data in clear text, encrypts and calls send() which implies
>> multiple copies of the data before the packets hit the networking stack.
>> Similar to kTLS, QUIC kernel offload of cryptography reduces the memory
>> pressure by reducing the number of copies.
>>
>> The scope of kernel support is limited to the symmetric cryptography,
>> leaving the handshake to the user space library. For QUIC in particular,
>> the application packets that require symmetric cryptography are the 1RTT
>> packets with short headers. Kernel will encrypt the application packets
>> on transmission and decrypt on receive. This series implements Tx only,
>> because in QUIC server applications Tx outweighs Rx by orders of
>> magnitude.
>>
>> Supporting the combination of QUIC and GSO requires the application to
>> correctly place the data and the kernel to correctly slice it. The
>> encryption process appends an arbitrary number of bytes (tag) to the end
>> of the message to authenticate it. The GSO value should include this
>> overhead, the offload would then subtract the tag size to parse the
>> input on Tx before chunking and encrypting it.
>>
>> With the kernel cryptography, the buffer copy operation is conjoined
>> with the encryption operation. The memory bandwidth is reduced by 5-8%.
>> When devices supporting QUIC encryption in hardware come to the market,
>> we will be able to free further 7% of CPU utilization which is used
>> today for crypto operations.
>>
>> Adel Abouchaev (6):
>>    Documentation on QUIC kernel Tx crypto.
>>    Define QUIC specific constants, control and data plane structures
>>    Add UDP ULP operations, initialization and handling prototype
>>      functions.
>>    Implement QUIC offload functions
>>    Add flow counters and Tx processing error counter
>>    Add self tests for ULP operations, flow setup and crypto tests
>>
>>   Documentation/networking/index.rst     |    1 +
>>   Documentation/networking/quic.rst      |  185 ++++
>>   include/net/inet_sock.h                |    2 +
>>   include/net/netns/mib.h                |    3 +
>>   include/net/quic.h                     |   63 ++
>>   include/net/snmp.h                     |    6 +
>>   include/net/udp.h                      |   33 +
>>   include/uapi/linux/quic.h              |   60 +
>>   include/uapi/linux/snmp.h              |    9 +
>>   include/uapi/linux/udp.h               |    4 +
>>   net/Kconfig                            |    1 +
>>   net/Makefile                           |    1 +
>>   net/ipv4/Makefile                      |    3 +-
>>   net/ipv4/udp.c                         |   15 +
>>   net/ipv4/udp_ulp.c                     |  192 ++++
>>   net/quic/Kconfig                       |   16 +
>>   net/quic/Makefile                      |    8 +
>>   net/quic/quic_main.c                   | 1417 ++++++++++++++++++++++++
>>   net/quic/quic_proc.c                   |   45 +
>>   tools/testing/selftests/net/.gitignore |    4 +-
>>   tools/testing/selftests/net/Makefile   |    3 +-
>>   tools/testing/selftests/net/quic.c     | 1153 +++++++++++++++++++
>>   tools/testing/selftests/net/quic.sh    |   46 +
>>   23 files changed, 3267 insertions(+), 3 deletions(-)
>>   create mode 100644 Documentation/networking/quic.rst
>>   create mode 100644 include/net/quic.h
>>   create mode 100644 include/uapi/linux/quic.h
>>   create mode 100644 net/ipv4/udp_ulp.c
>>   create mode 100644 net/quic/Kconfig
>>   create mode 100644 net/quic/Makefile
>>   create mode 100644 net/quic/quic_main.c
>>   create mode 100644 net/quic/quic_proc.c
>>   create mode 100644 tools/testing/selftests/net/quic.c
>>   create mode 100755 tools/testing/selftests/net/quic.sh
>>
>>
>> base-commit: fd78d07c7c35de260eb89f1be4a1e7487b8092ad
>> --
>> 2.30.2
>>
> Hi, Adel,
>
> I don't see how the key update(rfc9001#section-6) is handled on the TX
> path, which is not using TLS Key update, and "Key Phase" indicates
> which key will be used after rekeying. Also, I think it is almost
> impossible to handle the peer rekeying on the RX path either based on
> your current model in the future.

The update is not present in these patches, but it is an important part 
of the QUIC functionality. As this patch is only storing a single key, 
you are correct that this approach does not handle the key rotation. To 
implement re-keying on Tx and on Rx a rolling secret will need to be 
stored in kernel. In that case, the subsequent 1RTT (Application space) 
keys will be refreshed by the kernel. After all, when the hardware is 
mature enough to support QUIC encryption and decryption - the secret 
will need to be kept in the hardware to react on time on Rx, especially. 
Tx path could solicit the re-key at any point or by the exhaustion of 
the counter of GCM (packet number in this case). The RFC expects the 
implementation to retain 2 keys, at least, while keeping 3 (old, current 
and next) is not prohibited either. Keeping more is not necessary.

>
> The patch seems to get the crypto_ctx by doing a connection hash table
> lookup in the sendmsg(), which is not good from the performance side.
> One QUIC connection can go over multiple UDP sockets, but I don't
> think one socket can be used by multiple QUIC connections. So why not
> save the ctx in the socket instead?
A single socket could have multiple connections originated from it, 
having different destinations, if the socket is not connected. An 
optimization could be made for connected sockets to cache the context 
and save time on a lookup. The measurement of kernel operations timing 
did not reveal a significant amount of time spent in this lookup due to 
a relatively small number of connections per socket in general. A shared 
table across multiple sockets might experience a different performance 
grading.
>
> The patch is to reduce the copying operations between user space and
> the kernel. I might miss something in your user space code, but the
> msg to send is *already packed* into the Stream Frame in user space,
> what's the difference if you encrypt it in userspace and then
> sendmsg(udp_sk) with zero-copy to the kernel.
It is possible to do it this way. Zero-copy works best with packet sizes 
starting at 32K and larger.  Anything less than that would consume the 
improvements of zero-copy by zero-copy pre/post operations and needs to 
align memory. The other possible obstacle would be that eventual support 
of QUIC encryption and decryption in hardware would integrate well with 
this current approach.
>
> Didn't really understand the "GSO" you mentioned, as I don't see any
> code about kernel GSO, I guess it's just "Fragment size", right?
> BTW, it‘s not common to use "//" for the kernel annotation.
Once the payload arrives into the kernel, the GSO on the interface would 
instruct L3/L4 stack on fragmentation. In this case, the plaintext QUIC 
packets should be aligned on the GSO marks less the tag size that would 
be added by encryption. For GSO size 1000, the QUIC packets in the batch 
for transmission should all be 984 bytes long, except maybe the last 
one. Once the tag is attached, the new size of 1000 will correctly split 
the QUIC packets further down the stack for transmission in individual 
IP/UDP packets. The code is also saving processing time by sending all 
packets at once to UDP in a single call, when GSO is enabled.
>
> I'm not sure if it's worth adding a ULP layer over UDP for this QUIC
> TX only. Honestly, I'm more supporting doing a full QUIC stack in the
> kernel independently with socket APIs to use it:
> https://github.com/lxin/tls_hs.
>
> Thanks.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [net-next] Fix reinitialization of TEST_PROGS in net self tests.
  2022-08-24 18:43 ` [net-next] Fix reinitialization of TEST_PROGS in net self tests Adel Abouchaev
  2022-08-24 20:12   ` Shuah Khan
@ 2022-08-25 20:30   ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 77+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-08-25 20:30 UTC (permalink / raw)
  To: Adel Abouchaev
  Cc: davem, edumazet, kuba, pabeni, shuah, netdev, linux-kselftest

Hello:

This patch was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 24 Aug 2022 11:43:51 -0700 you wrote:
> Fix reinitialization of TEST_PROGS in net self tests.
> 
> Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>
> ---
>  tools/testing/selftests/net/Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Here is the summary with links:
  - [net-next] Fix reinitialization of TEST_PROGS in net self tests.
    https://git.kernel.org/netdev/net-next/c/88e500affe72

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 77+ messages in thread

* [net-next v3 0/6] net: support QUIC crypto
       [not found] <Adel Abouchaev <adel.abushaev@gmail.com>
                   ` (5 preceding siblings ...)
  2022-08-24 18:43 ` [net-next] Fix reinitialization of TEST_PROGS in net self tests Adel Abouchaev
@ 2022-09-07  0:49 ` Adel Abouchaev
  2022-09-07  0:49   ` [net-next v3 1/6] net: Documentation on QUIC kernel Tx crypto Adel Abouchaev
                     ` (5 more replies)
  2022-09-09  0:12 ` [net-next v4 0/6] net: support QUIC crypto Adel Abouchaev
  7 siblings, 6 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-09-07  0:49 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, corbet, dsahern, shuah, netdev,
	linux-doc, linux-kselftest

QUIC requires end to end encryption of the data. The application usually
prepares the data in clear text, encrypts and calls send() which implies
multiple copies of the data before the packets hit the networking stack.
Similar to kTLS, QUIC kernel offload of cryptography reduces the memory
pressure by reducing the number of copies.

The scope of kernel support is limited to the symmetric cryptography,
leaving the handshake to the user space library. For QUIC in particular,
the application packets that require symmetric cryptography are the 1RTT
packets with short headers. Kernel will encrypt the application packets
on transmission and decrypt on receive. This series implements Tx only,
because in QUIC server applications Tx outweighs Rx by orders of
magnitude.

Supporting the combination of QUIC and GSO requires the application to
correctly place the data and the kernel to correctly slice it. The
encryption process appends an arbitrary number of bytes (tag) to the end
of the message to authenticate it. The GSO value should include this
overhead, the offload would then subtract the tag size to parse the
input on Tx before chunking and encrypting it.

With the kernel cryptography, the buffer copy operation is conjoined
with the encryption operation. The memory bandwidth is reduced by 5-8%.
When devices supporting QUIC encryption in hardware come to the market,
we will be able to free further 7% of CPU utilization which is used
today for crypto operations.

Adel Abouchaev (6):
  Documentation on QUIC kernel Tx crypto.
  Define QUIC specific constants, control and data plane structures
  Add UDP ULP operations, initialization and handling prototype
    functions.
  Implement QUIC offload functions
  Add flow counters and Tx processing error counter
  Add self tests for ULP operations, flow setup and crypto tests

 Documentation/networking/index.rst     |    1 +
 Documentation/networking/quic.rst      |  211 ++++
 include/net/inet_sock.h                |    2 +
 include/net/netns/mib.h                |    3 +
 include/net/quic.h                     |   63 +
 include/net/snmp.h                     |    6 +
 include/net/udp.h                      |   33 +
 include/uapi/linux/quic.h              |   66 +
 include/uapi/linux/snmp.h              |    9 +
 include/uapi/linux/udp.h               |    4 +
 net/Kconfig                            |    1 +
 net/Makefile                           |    1 +
 net/ipv4/Makefile                      |    3 +-
 net/ipv4/udp.c                         |   15 +
 net/ipv4/udp_ulp.c                     |  192 +++
 net/quic/Kconfig                       |   16 +
 net/quic/Makefile                      |    8 +
 net/quic/quic_main.c                   | 1533 ++++++++++++++++++++++++
 net/quic/quic_proc.c                   |   45 +
 tools/testing/selftests/net/.gitignore |    1 +
 tools/testing/selftests/net/Makefile   |    3 +-
 tools/testing/selftests/net/quic.c     | 1370 +++++++++++++++++++++
 tools/testing/selftests/net/quic.sh    |   46 +
 23 files changed, 3630 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/networking/quic.rst
 create mode 100644 include/net/quic.h
 create mode 100644 include/uapi/linux/quic.h
 create mode 100644 net/ipv4/udp_ulp.c
 create mode 100644 net/quic/Kconfig
 create mode 100644 net/quic/Makefile
 create mode 100644 net/quic/quic_main.c
 create mode 100644 net/quic/quic_proc.c
 create mode 100644 tools/testing/selftests/net/quic.c
 create mode 100755 tools/testing/selftests/net/quic.sh

-- 
2.30.2


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [net-next v3 1/6] net: Documentation on QUIC kernel Tx crypto.
  2022-09-07  0:49 ` [net-next v3 0/6] net: support QUIC crypto Adel Abouchaev
@ 2022-09-07  0:49   ` Adel Abouchaev
  2022-09-07  3:38     ` Bagas Sanjaya
  2022-09-07  0:49   ` [net-next v3 2/6] net: Define QUIC specific constants, control and data plane structures Adel Abouchaev
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 77+ messages in thread
From: Adel Abouchaev @ 2022-09-07  0:49 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, corbet, dsahern, shuah, netdev,
	linux-doc, linux-kselftest
  Cc: kernel test robot

Add documentation for kernel QUIC code.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>

---

Added quic.rst reference to the index.rst file; identation in
quic.rst file.
Reported-by: kernel test robot <lkp@intel.com>

Added SPDX license GPL 2.0.
v2: Removed whitespace at EOF.
v3: Added explanation of features.
---
 Documentation/networking/index.rst |   1 +
 Documentation/networking/quic.rst  | 211 +++++++++++++++++++++++++++++
 2 files changed, 212 insertions(+)
 create mode 100644 Documentation/networking/quic.rst

diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index bacadd09e570..0dacd8c8a3ff 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -89,6 +89,7 @@ Contents:
    plip
    ppp_generic
    proc_net_tcp
+   quic
    radiotap-headers
    rds
    regulatory
diff --git a/Documentation/networking/quic.rst b/Documentation/networking/quic.rst
new file mode 100644
index 000000000000..2e6ec72f4eea
--- /dev/null
+++ b/Documentation/networking/quic.rst
@@ -0,0 +1,211 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========
+KERNEL QUIC
+===========
+
+Overview
+========
+
+QUIC is a secure general-purpose transport protocol that creates a stateful
+interaction between a client and a server. QUIC provides end-to-end integrity
+and confidentiality. Refer to RFC 9000 for more information on QUIC.
+
+The kernel Tx side offload covers the encryption of the application streams
+in the kernel rather than in the application. These packets are 1RTT packets
+in QUIC connection. Encryption of every other packets is still done by the
+QUIC library in user space.
+
+The flow match is performed using 5 parameters: source and destination IP
+addresses, source and destination UDP ports and destination QUIC connection ID.
+Not all 5 parameters are always needed. The Tx direction matches the flow on
+the destination IP, port and destination connection ID, while the Rx part would
+later match on source IP, port and destination connection ID. This will cover
+multiple scenarios where the server is using SO_REUSEADDR and/or empty
+destination connection IDs or combination of these.
+
+The Rx direction is not implemented in this set of patches.
+
+The connection migration scenario is not handled by the kernel code and will
+be handled by the user space portion of QUIC library. On the Tx direction,
+the new key would be installed before a packet with an updated destination is
+sent. On the Rx direction, the behavior will be to drop a packet if a flow is
+missing.
+
+For the key rotation, the behavior is to drop packets on Tx when the encryption
+key with matching key rotation bit is not present. On Rx direction, the packet
+will be sent to the userspace library with unencrypted header and encrypted
+payload. A separate indication will be added to the ancillary data to indicate
+the status of the operation as not matching the current key bit. It is not
+possible to use the key rotation bit as part of the key for flow lookup as that
+bit is protected by the header protection. A special provision will need to be
+done in user mode to still attempt the decryption of the payload to prevent a
+timing attack.
+
+
+User Interface
+==============
+
+Creating a QUIC connection
+--------------------------
+
+QUIC connection originates and terminates in the application, using one of many
+available QUIC libraries. The code instantiates QUIC client and QUIC server in
+some form and configures them to use certain addresses and ports for the
+source and destination. The client and server negotiate the set of keys to
+protect the communication during different phases of the connection, maintain
+the connection and perform congestion control.
+
+Requesting to add QUIC Tx kernel encryption to the connection
+-------------------------------------------------------------
+
+Each flow that should be encrypted by the kernel needs to be registered with
+the kernel using socket API. A setsockopt() call on the socket creates an
+association between the QUIC connection ID of the flow with the encryption
+parameters for the crypto operations:
+
+.. code-block:: c
+
+	struct quic_connection_info conn_info;
+	char conn_id[5] = {0x01, 0x02, 0x03, 0x04, 0x05};
+	const size_t conn_id_len = sizeof(conn_id);
+	char conn_key[16] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+			     0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f};
+	char conn_iv[12] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+			    0x08, 0x09, 0x0a, 0x0b};
+	char conn_hdr_key[16] = {0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
+				 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f
+				};
+
+        conn_info.conn_payload_key_gen = 0;
+	conn_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+
+	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_info.key.conn_id_length = 5;
+	memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE
+				      - conn_id_len],
+	       &conn_id, conn_id_len);
+
+	memcpy(&conn_info.payload_key, conn_key, sizeof(conn_key));
+	memcpy(&conn_info.payload_iv, conn_iv, sizeof(conn_iv));
+	memcpy(&conn_info.header_key, conn_hdr_key, sizeof(conn_hdr_key));
+
+	setsockopt(fd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION, &conn_info,
+		   sizeof(conn_info));
+
+
+Requesting to remove QUIC Tx kernel crypto offload control messages
+-------------------------------------------------------------------
+
+All flows are removed when the socket is closed. To request an explicit remove
+of the offload for the connection during the lifetime of the socket the process
+is similar to adding the flow. Only the connection ID and its length are
+necessary to supply to remove the connection from the offload:
+
+.. code-block:: c
+
+	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_info.key.conn_id_length = 5;
+	memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE
+				      - conn_id_len],
+	       &conn_id, conn_id_len);
+	setsockopt(fd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION, &conn_info,
+		   sizeof(conn_info));
+
+Sending QUIC application data
+-----------------------------
+
+For QUIC Tx encryption offload, the application should use sendmsg() socket
+call and provide ancillary data with information on connection ID length and
+offload flags for the kernel to perform the encryption and GSO support if
+requested.
+
+.. code-block:: c
+
+	size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
+	uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)];
+	struct quic_tx_ancillary_data * anc_data;
+	size_t quic_data_len = 4500;
+	struct cmsghdr * cmsg_hdr;
+	char quic_data[9000];
+	struct iovec iov[2];
+	int send_len = 9000;
+	struct msghdr msg;
+	int err;
+
+	iov[0].iov_base = quic_data;
+	iov[0].iov_len = quic_data_len;
+	iov[1].iov_base = quic_data + 4500;
+	iov[1].iov_len = quic_data_len;
+
+	if (client.addr.sin_family == AF_INET) {
+		msg.msg_name = &client.addr;
+		msg.msg_namelen = sizeof(client.addr);
+	} else {
+		msg.msg_name = &client.addr6;
+		msg.msg_namelen = sizeof(client.addr6);
+	}
+
+	msg.msg_iov = iov;
+	msg.msg_iovlen = 2;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
+	anc_data = CMSG_DATA(cmsg_hdr);
+	anc_data->flags = 0;
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->conn_id_length = conn_id_len;
+	err = sendmsg(self->sfd, &msg, 0);
+
+QUIC Tx offload in kernel will read the data from userspace, encrypt and
+copy it to the ciphertext within the same operation.
+
+
+Sending QUIC application data with GSO
+--------------------------------------
+When GSO is in use, the kernel will use the GSO fragment size as the target
+for ciphertext. The packets from the user space should align on the boundary
+of GSO fragment size minus the size of the tag for the chosen cipher. For the
+GSO fragment 1200, the plain packets should follow each other at every 1184
+bytes, given the tag size of 16. After the encryption, the rest of the UDP
+and IP stacks will follow the defined value of GSO fragment which will include
+the trailing tag bytes.
+
+To set up GSO fragmentation:
+
+.. code-block:: c
+
+	setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
+		   sizeof(frag_size));
+
+If the GSO fragment size is provided in ancillary data within the sendmsg()
+call, the value in ancillary data will take precedence over the segment size
+provided in setsockopt to split the payload into packets. This is consistent
+with the UDP stack behavior.
+
+Integrating to userspace QUIC libraries
+---------------------------------------
+
+Userspace QUIC libraries integration would depend on the implementation of the
+QUIC protocol. For MVFST library, the control plane is integrated into the
+handshake callbacks to properly configure the flows into the socket; and the
+data plane is integrated into the methods that perform encryption and send
+the packets to the batch scheduler for transmissions to the socket.
+
+MVFST library can be found at https://github.com/facebookincubator/mvfst.
+
+Statistics
+==========
+
+QUIC Tx offload to the kernel has counters
+(``/proc/net/quic_stat``):
+
+- ``QuicCurrTxSw`` -
+  number of currently active kernel offloaded QUIC connections
+- ``QuicTxSw`` -
+  accumulative total number of offloaded QUIC connections
+- ``QuicTxSwError`` -
+  accumulative total number of errors during QUIC Tx offload to kernel
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [net-next v3 2/6] net: Define QUIC specific constants, control and data plane structures
  2022-09-07  0:49 ` [net-next v3 0/6] net: support QUIC crypto Adel Abouchaev
  2022-09-07  0:49   ` [net-next v3 1/6] net: Documentation on QUIC kernel Tx crypto Adel Abouchaev
@ 2022-09-07  0:49   ` Adel Abouchaev
  2022-09-07  0:49   ` [net-next v3 3/6] net: Add UDP ULP operations, initialization and handling prototype functions Adel Abouchaev
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-09-07  0:49 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, corbet, dsahern, shuah, netdev,
	linux-doc, linux-kselftest

Define control and data plane structures to pass in control plane for
flow add/remove and during packet send within ancillary data. Define
constants to use within SOL_UDP to program QUIC sockets.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>

---

v3: added a 3-tuple to map a flow to a key, added key generation to
include into flow context.
---
 include/uapi/linux/quic.h | 66 +++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/udp.h  |  3 ++
 2 files changed, 69 insertions(+)
 create mode 100644 include/uapi/linux/quic.h

diff --git a/include/uapi/linux/quic.h b/include/uapi/linux/quic.h
new file mode 100644
index 000000000000..1fd9d2ed8683
--- /dev/null
+++ b/include/uapi/linux/quic.h
@@ -0,0 +1,66 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) */
+
+#ifndef _UAPI_LINUX_QUIC_H
+#define _UAPI_LINUX_QUIC_H
+
+#include <linux/types.h>
+#include <linux/tls.h>
+
+#define QUIC_MAX_CONNECTION_ID_SIZE	20
+
+/* Side by side data for QUIC egress operations */
+#define QUIC_BYPASS_ENCRYPTION	0x01
+
+struct quic_tx_ancillary_data {
+	__aligned_u64	next_pkt_num;
+	__u8	flags;
+	__u8	dst_conn_id_length;
+};
+
+struct quic_connection_info_key {
+	__u8	dst_conn_id[QUIC_MAX_CONNECTION_ID_SIZE];
+	__u8	dst_conn_id_length;
+	union {
+		struct in6_addr ipv6_addr;
+		struct in_addr  ipv4_addr;
+	} addr;
+	__be16  udp_port;
+};
+
+struct quic_aes_gcm_128 {
+	__u8	header_key[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
+	__u8	payload_key[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
+	__u8	payload_iv[TLS_CIPHER_AES_GCM_128_IV_SIZE];
+};
+
+struct quic_aes_gcm_256 {
+	__u8	header_key[TLS_CIPHER_AES_GCM_256_KEY_SIZE];
+	__u8	payload_key[TLS_CIPHER_AES_GCM_256_KEY_SIZE];
+	__u8	payload_iv[TLS_CIPHER_AES_GCM_256_IV_SIZE];
+};
+
+struct quic_aes_ccm_128 {
+	__u8	header_key[TLS_CIPHER_AES_CCM_128_KEY_SIZE];
+	__u8	payload_key[TLS_CIPHER_AES_CCM_128_KEY_SIZE];
+	__u8	payload_iv[TLS_CIPHER_AES_CCM_128_IV_SIZE];
+};
+
+struct quic_chacha20_poly1305 {
+	__u8	header_key[TLS_CIPHER_CHACHA20_POLY1305_KEY_SIZE];
+	__u8	payload_key[TLS_CIPHER_CHACHA20_POLY1305_KEY_SIZE];
+	__u8	payload_iv[TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE];
+};
+
+struct quic_connection_info {
+	__u16	cipher_type;
+	struct quic_connection_info_key		key;
+	__u8	conn_payload_key_gen;
+	union {
+		struct quic_aes_gcm_128 aes_gcm_128;
+		struct quic_aes_gcm_256 aes_gcm_256;
+		struct quic_aes_ccm_128 aes_ccm_128;
+		struct quic_chacha20_poly1305 chacha20_poly1305;
+	};
+};
+
+#endif
diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
index 4828794efcf8..0ee4c598e70b 100644
--- a/include/uapi/linux/udp.h
+++ b/include/uapi/linux/udp.h
@@ -34,6 +34,9 @@ struct udphdr {
 #define UDP_NO_CHECK6_RX 102	/* Disable accpeting checksum for UDP6 */
 #define UDP_SEGMENT	103	/* Set GSO segmentation size */
 #define UDP_GRO		104	/* This socket can receive UDP GRO packets */
+#define UDP_QUIC_ADD_TX_CONNECTION	106 /* Add QUIC Tx crypto offload */
+#define UDP_QUIC_DEL_TX_CONNECTION	107 /* Del QUIC Tx crypto offload */
+#define UDP_QUIC_ENCRYPT		108 /* QUIC encryption parameters */
 
 /* UDP encapsulation types */
 #define UDP_ENCAP_ESPINUDP_NON_IKE	1 /* draft-ietf-ipsec-nat-t-ike-00/01 */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [net-next v3 3/6] net: Add UDP ULP operations, initialization and handling prototype functions.
  2022-09-07  0:49 ` [net-next v3 0/6] net: support QUIC crypto Adel Abouchaev
  2022-09-07  0:49   ` [net-next v3 1/6] net: Documentation on QUIC kernel Tx crypto Adel Abouchaev
  2022-09-07  0:49   ` [net-next v3 2/6] net: Define QUIC specific constants, control and data plane structures Adel Abouchaev
@ 2022-09-07  0:49   ` Adel Abouchaev
  2022-09-07  0:49   ` [net-next v3 4/6] net: Implement QUIC offload functions Adel Abouchaev
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-09-07  0:49 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, corbet, dsahern, shuah, netdev,
	linux-doc, linux-kselftest

Define functions to add UDP ULP handling, registration with UDP protocol
and supporting data structures. Create structure for QUIC ULP and add empty
prototype functions to support it.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>

---

Removed reference to net/quic/Kconfig from this patch into the next.

Fixed formatting around brackets.
---
 include/net/inet_sock.h  |   2 +
 include/net/udp.h        |  33 +++++++
 include/uapi/linux/udp.h |   1 +
 net/Makefile             |   1 +
 net/ipv4/Makefile        |   3 +-
 net/ipv4/udp.c           |   6 ++
 net/ipv4/udp_ulp.c       | 192 +++++++++++++++++++++++++++++++++++++++
 7 files changed, 237 insertions(+), 1 deletion(-)
 create mode 100644 net/ipv4/udp_ulp.c

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index bf5654ce711e..650e332bdb50 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -249,6 +249,8 @@ struct inet_sock {
 	__be32			mc_addr;
 	struct ip_mc_socklist __rcu	*mc_list;
 	struct inet_cork_full	cork;
+	const struct udp_ulp_ops	*udp_ulp_ops;
+	void __rcu		*ulp_data;
 };
 
 #define IPCORK_OPT	1	/* ip-options has been held in ipcork.opt */
diff --git a/include/net/udp.h b/include/net/udp.h
index 5ee88ddf79c3..f22ebabbb186 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -523,4 +523,37 @@ struct proto *udp_bpf_get_proto(struct sock *sk, struct sk_psock *psock);
 int udp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore);
 #endif
 
+/*
+ * Interface for adding Upper Level Protocols over UDP
+ */
+
+#define UDP_ULP_NAME_MAX	16
+#define UDP_ULP_MAX		128
+
+struct udp_ulp_ops {
+	struct list_head	list;
+
+	/* initialize ulp */
+	int (*init)(struct sock *sk);
+	/* cleanup ulp */
+	void (*release)(struct sock *sk);
+
+	char		name[UDP_ULP_NAME_MAX];
+	struct module	*owner;
+};
+
+int udp_register_ulp(struct udp_ulp_ops *type);
+void udp_unregister_ulp(struct udp_ulp_ops *type);
+int udp_set_ulp(struct sock *sk, const char *name);
+void udp_get_available_ulp(char *buf, size_t len);
+void udp_cleanup_ulp(struct sock *sk);
+int udp_setsockopt_ulp(struct sock *sk, sockptr_t optval,
+		       unsigned int optlen);
+int udp_getsockopt_ulp(struct sock *sk, char __user *optval,
+		       int __user *optlen);
+
+#define MODULE_ALIAS_UDP_ULP(name)\
+	__MODULE_INFO(alias, alias_userspace, name);\
+	__MODULE_INFO(alias, alias_udp_ulp, "udp-ulp-" name)
+
 #endif	/* _UDP_H */
diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
index 0ee4c598e70b..893691f0108a 100644
--- a/include/uapi/linux/udp.h
+++ b/include/uapi/linux/udp.h
@@ -34,6 +34,7 @@ struct udphdr {
 #define UDP_NO_CHECK6_RX 102	/* Disable accpeting checksum for UDP6 */
 #define UDP_SEGMENT	103	/* Set GSO segmentation size */
 #define UDP_GRO		104	/* This socket can receive UDP GRO packets */
+#define UDP_ULP		105	/* Attach ULP to a UDP socket */
 #define UDP_QUIC_ADD_TX_CONNECTION	106 /* Add QUIC Tx crypto offload */
 #define UDP_QUIC_DEL_TX_CONNECTION	107 /* Del QUIC Tx crypto offload */
 #define UDP_QUIC_ENCRYPT		108 /* QUIC encryption parameters */
diff --git a/net/Makefile b/net/Makefile
index 6a62e5b27378..021ea3698d3a 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -16,6 +16,7 @@ obj-y				+= ethernet/ 802/ sched/ netlink/ bpf/ ethtool/
 obj-$(CONFIG_NETFILTER)		+= netfilter/
 obj-$(CONFIG_INET)		+= ipv4/
 obj-$(CONFIG_TLS)		+= tls/
+obj-$(CONFIG_QUIC)		+= quic/
 obj-$(CONFIG_XFRM)		+= xfrm/
 obj-$(CONFIG_UNIX_SCM)		+= unix/
 obj-y				+= ipv6/
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index bbdd9c44f14e..88d3baf4af95 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -14,7 +14,8 @@ obj-y     := route.o inetpeer.o protocol.o \
 	     udp_offload.o arp.o icmp.o devinet.o af_inet.o igmp.o \
 	     fib_frontend.o fib_semantics.o fib_trie.o fib_notifier.o \
 	     inet_fragment.o ping.o ip_tunnel_core.o gre_offload.o \
-	     metrics.o netlink.o nexthop.o udp_tunnel_stub.o
+	     metrics.o netlink.o nexthop.o udp_tunnel_stub.o \
+	     udp_ulp.o
 
 obj-$(CONFIG_BPFILTER) += bpfilter/
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 34eda973bbf1..027c4513a9cd 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2779,6 +2779,9 @@ int udp_lib_setsockopt(struct sock *sk, int level, int optname,
 		up->pcflag |= UDPLITE_RECV_CC;
 		break;
 
+	case UDP_ULP:
+		return udp_setsockopt_ulp(sk, optval, optlen);
+
 	default:
 		err = -ENOPROTOOPT;
 		break;
@@ -2847,6 +2850,9 @@ int udp_lib_getsockopt(struct sock *sk, int level, int optname,
 		val = up->pcrlen;
 		break;
 
+	case UDP_ULP:
+		return udp_getsockopt_ulp(sk, optval, optlen);
+
 	default:
 		return -ENOPROTOOPT;
 	}
diff --git a/net/ipv4/udp_ulp.c b/net/ipv4/udp_ulp.c
new file mode 100644
index 000000000000..138818690151
--- /dev/null
+++ b/net/ipv4/udp_ulp.c
@@ -0,0 +1,192 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Pluggable UDP upper layer protocol support, based on pluggable TCP upper
+ * layer protocol support.
+ *
+ * Copyright (c) 2016-2017, Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2016-2017, Dave Watson <davejwatson@fb.com>. All rights
+ * reserved.
+ * Copyright (c) 2021-2022, Meta Platforms, Inc. All rights reserved.
+ */
+
+#include <linux/gfp.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/mm.h>
+#include <linux/types.h>
+#include <linux/skmsg.h>
+#include <net/tcp.h>
+#include <net/udp.h>
+
+static DEFINE_SPINLOCK(udp_ulp_list_lock);
+static LIST_HEAD(udp_ulp_list);
+
+/* Simple linear search, don't expect many entries! */
+static struct udp_ulp_ops *udp_ulp_find(const char *name)
+{
+	struct udp_ulp_ops *e;
+
+	list_for_each_entry_rcu(e, &udp_ulp_list, list,
+				lockdep_is_held(&udp_ulp_list_lock)) {
+		if (strcmp(e->name, name) == 0)
+			return e;
+	}
+
+	return NULL;
+}
+
+static const struct udp_ulp_ops *__udp_ulp_find_autoload(const char *name)
+{
+	const struct udp_ulp_ops *ulp = NULL;
+
+	rcu_read_lock();
+	ulp = udp_ulp_find(name);
+
+#ifdef CONFIG_MODULES
+	if (!ulp && capable(CAP_NET_ADMIN)) {
+		rcu_read_unlock();
+		request_module("udp-ulp-%s", name);
+		rcu_read_lock();
+		ulp = udp_ulp_find(name);
+	}
+#endif
+	if (!ulp || !try_module_get(ulp->owner))
+		ulp = NULL;
+
+	rcu_read_unlock();
+	return ulp;
+}
+
+/* Attach new upper layer protocol to the list
+ * of available protocols.
+ */
+int udp_register_ulp(struct udp_ulp_ops *ulp)
+{
+	int ret = 0;
+
+	spin_lock(&udp_ulp_list_lock);
+	if (udp_ulp_find(ulp->name))
+		ret = -EEXIST;
+	else
+		list_add_tail_rcu(&ulp->list, &udp_ulp_list);
+
+	spin_unlock(&udp_ulp_list_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(udp_register_ulp);
+
+void udp_unregister_ulp(struct udp_ulp_ops *ulp)
+{
+	spin_lock(&udp_ulp_list_lock);
+	list_del_rcu(&ulp->list);
+	spin_unlock(&udp_ulp_list_lock);
+
+	synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(udp_unregister_ulp);
+
+void udp_cleanup_ulp(struct sock *sk)
+{
+	struct inet_sock *inet = inet_sk(sk);
+
+	/* No sock_owned_by_me() check here as at the time the
+	 * stack calls this function, the socket is dead and
+	 * about to be destroyed.
+	 */
+	if (!inet->udp_ulp_ops)
+		return;
+
+	if (inet->udp_ulp_ops->release)
+		inet->udp_ulp_ops->release(sk);
+	module_put(inet->udp_ulp_ops->owner);
+
+	inet->udp_ulp_ops = NULL;
+}
+
+static int __udp_set_ulp(struct sock *sk, const struct udp_ulp_ops *ulp_ops)
+{
+	struct inet_sock *inet = inet_sk(sk);
+	int err;
+
+	err = -EEXIST;
+	if (inet->udp_ulp_ops)
+		goto out_err;
+
+	err = ulp_ops->init(sk);
+	if (err)
+		goto out_err;
+
+	inet->udp_ulp_ops = ulp_ops;
+	return 0;
+
+out_err:
+	module_put(ulp_ops->owner);
+	return err;
+}
+
+int udp_set_ulp(struct sock *sk, const char *name)
+{
+	struct sk_psock *psock = sk_psock_get(sk);
+	const struct udp_ulp_ops *ulp_ops;
+
+	if (psock) {
+		sk_psock_put(sk, psock);
+		return -EINVAL;
+	}
+
+	sock_owned_by_me(sk);
+	ulp_ops = __udp_ulp_find_autoload(name);
+	if (!ulp_ops)
+		return -ENOENT;
+
+	return __udp_set_ulp(sk, ulp_ops);
+}
+
+int udp_setsockopt_ulp(struct sock *sk, sockptr_t optval, unsigned int optlen)
+{
+	char name[UDP_ULP_NAME_MAX];
+	int val, err;
+
+	if (!optlen || optlen > UDP_ULP_NAME_MAX)
+		return -EINVAL;
+
+	val = strncpy_from_sockptr(name, optval, optlen);
+	if (val < 0)
+		return -EFAULT;
+
+	if (val == UDP_ULP_NAME_MAX)
+		return -EINVAL;
+
+	name[val] = 0;
+	lock_sock(sk);
+	err = udp_set_ulp(sk, name);
+	release_sock(sk);
+	return err;
+}
+
+int udp_getsockopt_ulp(struct sock *sk, char __user *optval, int __user *optlen)
+{
+	struct inet_sock *inet = inet_sk(sk);
+	int len;
+
+	if (get_user(len, optlen))
+		return -EFAULT;
+
+	len = min_t(unsigned int, len, UDP_ULP_NAME_MAX);
+	if (len < 0)
+		return -EINVAL;
+
+	if (!inet->udp_ulp_ops) {
+		if (put_user(0, optlen))
+			return -EFAULT;
+		return 0;
+	}
+
+	if (put_user(len, optlen))
+		return -EFAULT;
+	if (copy_to_user(optval, inet->udp_ulp_ops->name, len))
+		return -EFAULT;
+
+	return 0;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [net-next v3 4/6] net: Implement QUIC offload functions
  2022-09-07  0:49 ` [net-next v3 0/6] net: support QUIC crypto Adel Abouchaev
                     ` (2 preceding siblings ...)
  2022-09-07  0:49   ` [net-next v3 3/6] net: Add UDP ULP operations, initialization and handling prototype functions Adel Abouchaev
@ 2022-09-07  0:49   ` Adel Abouchaev
  2022-09-07  0:49   ` [net-next v3 5/6] net: Add flow counters and Tx processing error counter Adel Abouchaev
  2022-09-07  0:49   ` [net-next v3 6/6] net: Add self tests for ULP operations, flow setup and crypto tests Adel Abouchaev
  5 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-09-07  0:49 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, corbet, dsahern, shuah, netdev,
	linux-doc, linux-kselftest

Add connection hash to the context do support add, remove operations
on QUIC connections for the control plane and lookup for the data
plane. Implement setsockopt and add placeholders to add and delete Tx
connections.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>

---

Added net/quic/Kconfig reference to net/Kconfig in this commit.

Initialized pointers with NULL vs 0. Restricted AES counter to __le32
Added address space qualifiers to user space addresses. Removed empty
lines. Updated code alignment. Removed inlines.

v3: removed ITER_KVEC flag from iov_iter_kvec call.
v3: fixed Chacha20 encryption bug.
v3: updated to match the uAPI struct fields
v3: updated Tx flow to match on dst ip, dst port and connection id.
v3: updated to drop packets if key generations do not match.
---
 include/net/quic.h   |   53 ++
 net/Kconfig          |    1 +
 net/ipv4/udp.c       |    9 +
 net/quic/Kconfig     |   16 +
 net/quic/Makefile    |    8 +
 net/quic/quic_main.c | 1487 ++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 1574 insertions(+)
 create mode 100644 include/net/quic.h
 create mode 100644 net/quic/Kconfig
 create mode 100644 net/quic/Makefile
 create mode 100644 net/quic/quic_main.c

diff --git a/include/net/quic.h b/include/net/quic.h
new file mode 100644
index 000000000000..cafe01174e60
--- /dev/null
+++ b/include/net/quic.h
@@ -0,0 +1,53 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifndef INCLUDE_NET_QUIC_H
+#define INCLUDE_NET_QUIC_H
+
+#include <linux/mutex.h>
+#include <linux/rhashtable.h>
+#include <linux/skmsg.h>
+#include <uapi/linux/quic.h>
+
+#define QUIC_MAX_SHORT_HEADER_SIZE      25
+#define QUIC_MAX_CONNECTION_ID_SIZE     20
+#define QUIC_HDR_MASK_SIZE              16
+#define QUIC_MAX_GSO_FRAGS              16
+
+// Maximum IV and nonce sizes should be in sync with supported ciphers.
+#define QUIC_CIPHER_MAX_IV_SIZE		12
+#define QUIC_CIPHER_MAX_NONCE_SIZE	16
+
+/* Side by side data for QUIC egress operations */
+#define QUIC_ANCILLARY_FLAGS    (QUIC_BYPASS_ENCRYPTION)
+
+#define QUIC_MAX_IOVEC_SEGMENTS		8
+#define QUIC_MAX_SG_ALLOC_ELEMENTS	32
+#define QUIC_MAX_PLAIN_PAGES		16
+#define QUIC_MAX_CIPHER_PAGES_ORDER	4
+
+struct quic_internal_crypto_context {
+	struct quic_connection_info	conn_info;
+	struct crypto_skcipher		*header_tfm;
+	struct crypto_aead		*packet_aead;
+};
+
+struct quic_connection_rhash {
+	struct rhash_head			node;
+	struct quic_internal_crypto_context	crypto_ctx;
+	struct rcu_head				rcu;
+};
+
+struct quic_context {
+	struct proto		*sk_proto;
+	struct rhashtable	tx_connections;
+	struct scatterlist	sg_alloc[QUIC_MAX_SG_ALLOC_ELEMENTS];
+	struct page		*cipher_page;
+	/**
+	 * To synchronize concurrent sendmsg() requests through the same socket
+	 * and protect preallocated per-context memory.
+	 **/
+	struct mutex		sendmsg_mux;
+	struct rcu_head		rcu;
+};
+
+#endif
diff --git a/net/Kconfig b/net/Kconfig
index 48c33c222199..6824d07b9e57 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -63,6 +63,7 @@ menu "Networking options"
 source "net/packet/Kconfig"
 source "net/unix/Kconfig"
 source "net/tls/Kconfig"
+source "net/quic/Kconfig"
 source "net/xfrm/Kconfig"
 source "net/iucv/Kconfig"
 source "net/smc/Kconfig"
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 027c4513a9cd..e7cbbea9d8d9 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -113,6 +113,7 @@
 #include <net/sock_reuseport.h>
 #include <net/addrconf.h>
 #include <net/udp_tunnel.h>
+#include <uapi/linux/quic.h>
 #if IS_ENABLED(CONFIG_IPV6)
 #include <net/ipv6_stubs.h>
 #endif
@@ -1011,6 +1012,14 @@ static int __udp_cmsg_send(struct cmsghdr *cmsg, u16 *gso_size)
 			return -EINVAL;
 		*gso_size = *(__u16 *)CMSG_DATA(cmsg);
 		return 0;
+	case UDP_QUIC_ENCRYPT:
+		/* This option is handled in UDP_ULP and is only checked
+		 * here for the bypass bit
+		 */
+		if (cmsg->cmsg_len !=
+		    CMSG_LEN(sizeof(struct quic_tx_ancillary_data)))
+			return -EINVAL;
+		return 0;
 	default:
 		return -EINVAL;
 	}
diff --git a/net/quic/Kconfig b/net/quic/Kconfig
new file mode 100644
index 000000000000..661cb989508a
--- /dev/null
+++ b/net/quic/Kconfig
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# QUIC configuration
+#
+config QUIC
+	tristate "QUIC encryption offload"
+	depends on INET
+	select CRYPTO
+	select CRYPTO_AES
+	select CRYPTO_GCM
+	help
+	Enable kernel support for QUIC crypto offload. Currently only TX
+	encryption offload is supported. The kernel will perform
+	copy-during-encryption.
+
+	If unsure, say N.
diff --git a/net/quic/Makefile b/net/quic/Makefile
new file mode 100644
index 000000000000..928239c4d08c
--- /dev/null
+++ b/net/quic/Makefile
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for the QUIC subsystem
+#
+
+obj-$(CONFIG_QUIC) += quic.o
+
+quic-y := quic_main.o
diff --git a/net/quic/quic_main.c b/net/quic/quic_main.c
new file mode 100644
index 000000000000..a43d989a1c8e
--- /dev/null
+++ b/net/quic/quic_main.c
@@ -0,0 +1,1487 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <crypto/skcipher.h>
+#include <linux/bug.h>
+#include <linux/module.h>
+#include <linux/rhashtable.h>
+// Include header to use TLS constants for AEAD cipher.
+#include <net/tls.h>
+#include <net/quic.h>
+#include <net/udp.h>
+#include <uapi/linux/quic.h>
+
+static unsigned long af_init_done;
+static struct proto quic_v4_proto;
+static struct proto quic_v6_proto;
+static DEFINE_SPINLOCK(quic_proto_lock);
+
+static u32 quic_tx_connection_hash(const void *data, u32 len, u32 seed)
+{
+	return jhash(data, len, seed);
+}
+
+static u32 quic_tx_connection_hash_obj(const void *data, u32 len, u32 seed)
+{
+	const struct quic_connection_rhash *connhash = data;
+
+	return jhash(&connhash->crypto_ctx.conn_info.key,
+		     sizeof(struct quic_connection_info_key), seed);
+}
+
+static int quic_tx_connection_hash_cmp(struct rhashtable_compare_arg *arg,
+				       const void *ptr)
+{
+	const struct quic_connection_info_key *key = arg->key;
+	const struct quic_connection_rhash *x = ptr;
+
+	return !!memcmp(&x->crypto_ctx.conn_info.key,
+			key,
+			sizeof(struct quic_connection_info_key));
+}
+
+static const struct rhashtable_params quic_tx_connection_params = {
+	.key_len		= sizeof(struct quic_connection_info_key),
+	.head_offset		= offsetof(struct quic_connection_rhash, node),
+	.hashfn			= quic_tx_connection_hash,
+	.obj_hashfn		= quic_tx_connection_hash_obj,
+	.obj_cmpfn		= quic_tx_connection_hash_cmp,
+	.automatic_shrinking	= true,
+};
+
+static size_t quic_crypto_key_size(u16 cipher_type)
+{
+	switch (cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		return TLS_CIPHER_AES_GCM_128_KEY_SIZE;
+	case TLS_CIPHER_AES_GCM_256:
+		return TLS_CIPHER_AES_GCM_256_KEY_SIZE;
+	case TLS_CIPHER_AES_CCM_128:
+		return TLS_CIPHER_AES_CCM_128_KEY_SIZE;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		return TLS_CIPHER_CHACHA20_POLY1305_KEY_SIZE;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return 0;
+}
+
+static size_t quic_crypto_tag_size(u16 cipher_type)
+{
+	switch (cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		return TLS_CIPHER_AES_GCM_128_TAG_SIZE;
+	case TLS_CIPHER_AES_GCM_256:
+		return TLS_CIPHER_AES_GCM_256_TAG_SIZE;
+	case TLS_CIPHER_AES_CCM_128:
+		return TLS_CIPHER_AES_CCM_128_TAG_SIZE;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		return TLS_CIPHER_CHACHA20_POLY1305_TAG_SIZE;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return 0;
+}
+
+static size_t quic_crypto_nonce_size(u16 cipher_type)
+{
+	switch (cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		BUILD_BUG_ON(TLS_CIPHER_AES_GCM_128_IV_SIZE +
+			     TLS_CIPHER_AES_GCM_128_SALT_SIZE >
+			     QUIC_CIPHER_MAX_NONCE_SIZE);
+		return TLS_CIPHER_AES_GCM_128_IV_SIZE +
+		       TLS_CIPHER_AES_GCM_128_SALT_SIZE;
+	case TLS_CIPHER_AES_GCM_256:
+		BUILD_BUG_ON(TLS_CIPHER_AES_GCM_256_IV_SIZE +
+			     TLS_CIPHER_AES_GCM_256_SALT_SIZE >
+			     QUIC_CIPHER_MAX_NONCE_SIZE);
+		return TLS_CIPHER_AES_GCM_256_IV_SIZE +
+		       TLS_CIPHER_AES_GCM_256_SALT_SIZE;
+	case TLS_CIPHER_AES_CCM_128:
+		BUILD_BUG_ON(TLS_CIPHER_AES_CCM_128_IV_SIZE +
+			     TLS_CIPHER_AES_CCM_128_SALT_SIZE >
+			     QUIC_CIPHER_MAX_NONCE_SIZE);
+		return TLS_CIPHER_AES_CCM_128_IV_SIZE +
+		       TLS_CIPHER_AES_CCM_128_SALT_SIZE;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		BUILD_BUG_ON(TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE +
+			     TLS_CIPHER_CHACHA20_POLY1305_SALT_SIZE >
+			     QUIC_CIPHER_MAX_NONCE_SIZE);
+		return TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE +
+		       TLS_CIPHER_CHACHA20_POLY1305_SALT_SIZE;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return 0;
+}
+
+static u8 *quic_payload_iv(struct quic_internal_crypto_context *crypto_ctx)
+{
+	switch (crypto_ctx->conn_info.cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		return crypto_ctx->conn_info.aes_gcm_128.payload_iv;
+	case TLS_CIPHER_AES_GCM_256:
+		return crypto_ctx->conn_info.aes_gcm_256.payload_iv;
+	case TLS_CIPHER_AES_CCM_128:
+		return crypto_ctx->conn_info.aes_ccm_128.payload_iv;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		return crypto_ctx->conn_info.chacha20_poly1305.payload_iv;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return NULL;
+}
+
+static int
+quic_config_header_crypto(struct quic_internal_crypto_context *crypto_ctx)
+{
+	struct crypto_skcipher *tfm;
+	char *header_cipher;
+	int rc = 0;
+	char *key;
+
+	switch (crypto_ctx->conn_info.cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		header_cipher = "ecb(aes)";
+		key = crypto_ctx->conn_info.aes_gcm_128.header_key;
+		break;
+	case TLS_CIPHER_AES_GCM_256:
+		header_cipher = "ecb(aes)";
+		key = crypto_ctx->conn_info.aes_gcm_256.header_key;
+		break;
+	case TLS_CIPHER_AES_CCM_128:
+		header_cipher = "ecb(aes)";
+		key = crypto_ctx->conn_info.aes_ccm_128.header_key;
+		break;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		header_cipher = "chacha20";
+		key = crypto_ctx->conn_info.chacha20_poly1305.header_key;
+		break;
+	default:
+		rc = -EINVAL;
+		goto out;
+	}
+
+	tfm = crypto_alloc_skcipher(header_cipher, 0, 0);
+	if (IS_ERR(tfm)) {
+		rc = PTR_ERR(tfm);
+		goto out;
+	}
+
+	rc = crypto_skcipher_setkey(tfm, key,
+				    quic_crypto_key_size(crypto_ctx->conn_info
+							 .cipher_type));
+	if (rc) {
+		crypto_free_skcipher(tfm);
+		goto out;
+	}
+
+	crypto_ctx->header_tfm = tfm;
+
+out:
+	return rc;
+}
+
+static int
+quic_config_packet_crypto(struct quic_internal_crypto_context *crypto_ctx)
+{
+	struct crypto_aead *aead;
+	char *cipher_name;
+	int rc = 0;
+	char *key;
+
+	switch (crypto_ctx->conn_info.cipher_type) {
+	case TLS_CIPHER_AES_GCM_128: {
+		key = crypto_ctx->conn_info.aes_gcm_128.payload_key;
+		cipher_name = "gcm(aes)";
+		break;
+	}
+	case TLS_CIPHER_AES_GCM_256: {
+		key = crypto_ctx->conn_info.aes_gcm_256.payload_key;
+		cipher_name = "gcm(aes)";
+		break;
+	}
+	case TLS_CIPHER_AES_CCM_128: {
+		key = crypto_ctx->conn_info.aes_ccm_128.payload_key;
+		cipher_name = "ccm(aes)";
+		break;
+	}
+	case TLS_CIPHER_CHACHA20_POLY1305: {
+		key = crypto_ctx->conn_info.chacha20_poly1305.payload_key;
+		cipher_name = "rfc7539(chacha20,poly1305)";
+		break;
+	}
+	default:
+		rc = -EINVAL;
+		goto out;
+	}
+
+	aead = crypto_alloc_aead(cipher_name, 0, 0);
+	if (IS_ERR(aead)) {
+		rc = PTR_ERR(aead);
+		goto out;
+	}
+
+	rc = crypto_aead_setkey(aead, key,
+				quic_crypto_key_size(crypto_ctx->conn_info
+						     .cipher_type));
+	if (rc)
+		goto free_aead;
+
+	rc = crypto_aead_setauthsize(aead,
+				     quic_crypto_tag_size(crypto_ctx->conn_info
+							  .cipher_type));
+	if (rc)
+		goto free_aead;
+
+	crypto_ctx->packet_aead = aead;
+	goto out;
+
+free_aead:
+	crypto_free_aead(aead);
+
+out:
+	return rc;
+}
+
+static inline struct quic_context *quic_get_ctx(struct sock *sk)
+{
+	struct inet_sock *inet = inet_sk(sk);
+
+	return (__force void *)rcu_access_pointer(inet->ulp_data);
+}
+
+static void quic_free_cipher_page(struct page *page)
+{
+	__free_pages(page, QUIC_MAX_CIPHER_PAGES_ORDER);
+}
+
+static struct quic_context *quic_ctx_create(void)
+{
+	struct quic_context *ctx;
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return NULL;
+
+	mutex_init(&ctx->sendmsg_mux);
+	ctx->cipher_page = alloc_pages(GFP_KERNEL, QUIC_MAX_CIPHER_PAGES_ORDER);
+	if (!ctx->cipher_page)
+		goto out_err;
+
+	if (rhashtable_init(&ctx->tx_connections,
+			    &quic_tx_connection_params) < 0) {
+		quic_free_cipher_page(ctx->cipher_page);
+		goto out_err;
+	}
+
+	return ctx;
+
+out_err:
+	kfree(ctx);
+	return NULL;
+}
+
+static int quic_getsockopt(struct sock *sk, int level, int optname,
+			   char __user *optval, int __user *optlen)
+{
+	struct quic_context *ctx = quic_get_ctx(sk);
+
+	return ctx->sk_proto->getsockopt(sk, level, optname, optval, optlen);
+}
+
+static void quic_update_key_if_mapped_ipv4(struct quic_connection_info_key *key)
+{
+	if (ipv6_addr_v4mapped(&key->addr.ipv6_addr)) {
+		key->addr.ipv6_addr.s6_addr32[0] =
+			key->addr.ipv6_addr.s6_addr32[3];
+		key->addr.ipv6_addr.s6_addr32[1] = 0;
+		key->addr.ipv6_addr.s6_addr32[2] = 0;
+		key->addr.ipv6_addr.s6_addr32[3] = 0;
+	}
+}
+
+static int do_quic_conn_add_tx(struct sock *sk, sockptr_t optval,
+			       unsigned int optlen)
+{
+	struct quic_internal_crypto_context *crypto_ctx;
+	struct quic_context *ctx = quic_get_ctx(sk);
+	struct quic_connection_rhash *connhash;
+	int rc = 0;
+
+	if (sockptr_is_null(optval))
+		return -EINVAL;
+
+	if (optlen != sizeof(struct quic_connection_info))
+		return -EINVAL;
+
+	connhash = kzalloc(sizeof(*connhash), GFP_KERNEL);
+	if (!connhash)
+		return -EFAULT;
+
+	crypto_ctx = &connhash->crypto_ctx;
+	rc = copy_from_sockptr(&crypto_ctx->conn_info, optval,
+			       sizeof(crypto_ctx->conn_info));
+	if (rc) {
+		rc = -EFAULT;
+		goto err_crypto_info;
+	}
+
+	quic_update_key_if_mapped_ipv4(&crypto_ctx->conn_info.key);
+
+	if (crypto_ctx->conn_info.key.dst_conn_id_length >
+	    QUIC_MAX_CONNECTION_ID_SIZE) {
+		rc = -EINVAL;
+		goto err_crypto_info;
+	}
+
+	if (crypto_ctx->conn_info.conn_payload_key_gen > 1) {
+		rc = -EINVAL;
+		goto err_crypto_info;
+	}
+
+	// create all TLS materials for packet and header decryption
+	rc = quic_config_header_crypto(crypto_ctx);
+	if (rc)
+		goto err_crypto_info;
+
+	rc = quic_config_packet_crypto(crypto_ctx);
+	if (rc)
+		goto err_free_skcipher;
+
+	// insert crypto data into hash per connection ID
+	rc = rhashtable_insert_fast(&ctx->tx_connections, &connhash->node,
+				    quic_tx_connection_params);
+	if (rc < 0)
+		goto err_free_ciphers;
+
+	return 0;
+
+err_free_ciphers:
+	crypto_free_aead(crypto_ctx->packet_aead);
+
+err_free_skcipher:
+	crypto_free_skcipher(crypto_ctx->header_tfm);
+
+err_crypto_info:
+	// wipeout all crypto materials;
+	memzero_explicit(&connhash->crypto_ctx, sizeof(connhash->crypto_ctx));
+	kfree(connhash);
+	return rc;
+}
+
+static int do_quic_conn_del_tx(struct sock *sk, sockptr_t optval,
+			       unsigned int optlen)
+{
+	struct quic_internal_crypto_context *crypto_ctx;
+	struct quic_context *ctx = quic_get_ctx(sk);
+	struct quic_connection_rhash *connhash;
+	struct quic_connection_info conn_info;
+
+	if (sockptr_is_null(optval))
+		return -EINVAL;
+
+	if (optlen != sizeof(struct quic_connection_info))
+		return -EINVAL;
+
+	if (copy_from_sockptr(&conn_info, optval, optlen))
+		return -EFAULT;
+
+	if (conn_info.key.dst_conn_id_length >
+	    QUIC_MAX_CONNECTION_ID_SIZE)
+		return -EINVAL;
+
+	if (conn_info.conn_payload_key_gen > 1)
+		return -EINVAL;
+
+	quic_update_key_if_mapped_ipv4(&conn_info.key);
+
+	connhash = rhashtable_lookup_fast(&ctx->tx_connections,
+					  &conn_info.key,
+					  quic_tx_connection_params);
+	if (!connhash)
+		return -EINVAL;
+
+	rhashtable_remove_fast(&ctx->tx_connections,
+			       &connhash->node,
+			       quic_tx_connection_params);
+
+	crypto_ctx = &connhash->crypto_ctx;
+
+	crypto_free_skcipher(crypto_ctx->header_tfm);
+	crypto_free_aead(crypto_ctx->packet_aead);
+	memzero_explicit(crypto_ctx, sizeof(*crypto_ctx));
+	kfree(connhash);
+
+	return 0;
+}
+
+static int do_quic_setsockopt(struct sock *sk, int optname, sockptr_t optval,
+			      unsigned int optlen)
+{
+	int rc = 0;
+
+	switch (optname) {
+	case UDP_QUIC_ADD_TX_CONNECTION:
+		lock_sock(sk);
+		rc = do_quic_conn_add_tx(sk, optval, optlen);
+		release_sock(sk);
+		break;
+	case UDP_QUIC_DEL_TX_CONNECTION:
+		lock_sock(sk);
+		rc = do_quic_conn_del_tx(sk, optval, optlen);
+		release_sock(sk);
+		break;
+	default:
+		rc = -ENOPROTOOPT;
+		break;
+	}
+
+	return rc;
+}
+
+static int quic_setsockopt(struct sock *sk, int level, int optname,
+			   sockptr_t optval, unsigned int optlen)
+{
+	struct quic_context *ctx;
+	struct proto *sk_proto;
+
+	rcu_read_lock();
+	ctx = quic_get_ctx(sk);
+	sk_proto = ctx->sk_proto;
+	rcu_read_unlock();
+
+	if (level == SOL_UDP &&
+	    (optname == UDP_QUIC_ADD_TX_CONNECTION ||
+	     optname == UDP_QUIC_DEL_TX_CONNECTION))
+		return do_quic_setsockopt(sk, optname, optval, optlen);
+
+	return sk_proto->setsockopt(sk, level, optname, optval, optlen);
+}
+
+static int
+quic_extract_ancillary_data(struct msghdr *msg,
+			    struct quic_tx_ancillary_data *ancillary_data,
+			    u16 *udp_pkt_size)
+{
+	struct cmsghdr *cmsg_hdr = NULL;
+	void *ancillary_data_ptr = NULL;
+
+	if (!msg->msg_controllen)
+		return -EINVAL;
+
+	for_each_cmsghdr(cmsg_hdr, msg) {
+		if (!CMSG_OK(msg, cmsg_hdr))
+			return -EINVAL;
+
+		if (cmsg_hdr->cmsg_level != IPPROTO_UDP)
+			continue;
+
+		if (cmsg_hdr->cmsg_type == UDP_QUIC_ENCRYPT) {
+			if (cmsg_hdr->cmsg_len !=
+			    CMSG_LEN(sizeof(struct quic_tx_ancillary_data)))
+				return -EINVAL;
+			memcpy((void *)ancillary_data, CMSG_DATA(cmsg_hdr),
+			       sizeof(struct quic_tx_ancillary_data));
+			ancillary_data_ptr = cmsg_hdr;
+		} else if (cmsg_hdr->cmsg_type == UDP_SEGMENT) {
+			if (cmsg_hdr->cmsg_len != CMSG_LEN(sizeof(u16)))
+				return -EINVAL;
+			memcpy((void *)udp_pkt_size, CMSG_DATA(cmsg_hdr),
+			       sizeof(u16));
+		}
+	}
+
+	if (!ancillary_data_ptr)
+		return -EINVAL;
+
+	return 0;
+}
+
+static int quic_sendmsg_validate(struct msghdr *msg)
+{
+	if (!iter_is_iovec(&msg->msg_iter))
+		return -EINVAL;
+
+	if (!msg->msg_controllen)
+		return -EINVAL;
+
+	return 0;
+}
+
+static struct quic_connection_rhash
+*quic_lookup_connection(struct quic_context *ctx,
+			u8 *conn_id,
+			struct quic_tx_ancillary_data *ancillary_data,
+			sa_family_t sa_family,
+			void *addr,
+			__be16 port)
+{
+	struct quic_connection_info_key conn_key;
+	size_t addrlen;
+
+	// Lookup connection information by the connection key.
+	memset(&conn_key, 0, sizeof(struct quic_connection_info_key));
+	// fill the connection id up to the max connection ID length
+	if (ancillary_data->dst_conn_id_length > QUIC_MAX_CONNECTION_ID_SIZE)
+		return NULL;
+
+	conn_key.dst_conn_id_length = ancillary_data->dst_conn_id_length;
+	if (ancillary_data->dst_conn_id_length)
+		memcpy(conn_key.dst_conn_id,
+		       conn_id,
+		       ancillary_data->dst_conn_id_length);
+
+	addrlen = (sa_family == AF_INET) ? 4 : 16;
+	memcpy(&conn_key.addr, addr, addrlen);
+	conn_key.udp_port = port;
+
+	return rhashtable_lookup_fast(&ctx->tx_connections,
+				      &conn_key,
+				      quic_tx_connection_params);
+}
+
+static int quic_sg_capacity_from_msg(const size_t pkt_size,
+				     const off_t offset,
+				     const size_t length)
+{
+	size_t	pages = 0;
+	size_t	pkts = 0;
+
+	pages = DIV_ROUND_UP(offset + length, PAGE_SIZE);
+	pkts = DIV_ROUND_UP(length, pkt_size);
+	return pages + pkts + 1;
+}
+
+static void quic_put_plain_user_pages(struct page **pages, size_t nr_pages)
+{
+	int i;
+
+	for (i = 0; i < nr_pages; ++i)
+		if (i == 0 || pages[i] != pages[i - 1])
+			put_page(pages[i]);
+}
+
+static int quic_get_plain_user_pages(struct msghdr * const msg,
+				     struct page **pages,
+				     int *page_indices)
+{
+	void __user	*data_addr;
+	size_t	nr_mapped = 0;
+	size_t	nr_pages = 0;
+	void	*page_addr;
+	size_t	count = 0;
+	off_t	data_off;
+	int	ret = 0;
+	int	i;
+
+	for (i = 0; i < msg->msg_iter.nr_segs; ++i) {
+		data_addr = msg->msg_iter.iov[i].iov_base;
+		if (!i)
+			data_addr += msg->msg_iter.iov_offset;
+		page_addr =
+			(void *)((unsigned long)data_addr & PAGE_MASK);
+
+		data_off = (unsigned long)data_addr & ~PAGE_MASK;
+		nr_pages =
+			DIV_ROUND_UP(data_off + msg->msg_iter.iov[i].iov_len,
+				     PAGE_SIZE);
+		if (nr_mapped + nr_pages > QUIC_MAX_PLAIN_PAGES) {
+			quic_put_plain_user_pages(pages, nr_mapped);
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		count = get_user_pages((unsigned long)page_addr, nr_pages, 1,
+				       pages, NULL);
+		if (count < nr_pages) {
+			quic_put_plain_user_pages(pages, nr_mapped + count);
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		page_indices[i] = nr_mapped;
+		nr_mapped += count;
+		pages += count;
+	}
+	ret = nr_mapped;
+
+out:
+	return ret;
+}
+
+static int quic_sg_plain_from_mapped_msg(struct msghdr * const msg,
+					 struct page **plain_pages,
+					 void **iov_base_ptrs,
+					 void **iov_data_ptrs,
+					 const size_t plain_size,
+					 const size_t pkt_size,
+					 struct scatterlist * const sg_alloc,
+					 const size_t max_sg_alloc,
+					 struct scatterlist ** const sg_pkts,
+					 size_t *nr_plain_pages)
+{
+	int iov_page_indices[QUIC_MAX_IOVEC_SEGMENTS];
+	struct scatterlist *sg;
+	unsigned int pkt_i = 0;
+	ssize_t left_on_page;
+	size_t pkt_left;
+	unsigned int i;
+	size_t seg_len;
+	off_t page_ofs;
+	off_t seg_ofs;
+	int ret = 0;
+	int page_i;
+
+	if (msg->msg_iter.nr_segs >= QUIC_MAX_IOVEC_SEGMENTS) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ret = quic_get_plain_user_pages(msg, plain_pages, iov_page_indices);
+	if (ret < 0)
+		goto out;
+
+	*nr_plain_pages = ret;
+	sg = sg_alloc;
+	sg_pkts[pkt_i] = sg;
+	sg_unmark_end(sg);
+	pkt_left = pkt_size;
+	for (i = 0; i < msg->msg_iter.nr_segs; ++i) {
+		page_ofs = ((unsigned long)msg->msg_iter.iov[i].iov_base
+			   & (PAGE_SIZE - 1));
+		page_i = 0;
+		if (!i) {
+			page_ofs += msg->msg_iter.iov_offset;
+			while (page_ofs >= PAGE_SIZE) {
+				page_ofs -= PAGE_SIZE;
+				page_i++;
+			}
+		}
+
+		seg_len = msg->msg_iter.iov[i].iov_len;
+		page_i += iov_page_indices[i];
+
+		if (page_i >= QUIC_MAX_PLAIN_PAGES)
+			return -EFAULT;
+
+		seg_ofs = 0;
+		while (seg_ofs < seg_len) {
+			if (sg - sg_alloc > max_sg_alloc)
+				return -EFAULT;
+
+			sg_unmark_end(sg);
+			left_on_page = min_t(size_t, PAGE_SIZE - page_ofs,
+					     seg_len - seg_ofs);
+			if (left_on_page <= 0)
+				return -EFAULT;
+
+			if (left_on_page > pkt_left) {
+				sg_set_page(sg, plain_pages[page_i], pkt_left,
+					    page_ofs);
+				pkt_i++;
+				seg_ofs += pkt_left;
+				page_ofs += pkt_left;
+				sg_mark_end(sg);
+				sg++;
+				sg_pkts[pkt_i] = sg;
+				pkt_left = pkt_size;
+				continue;
+			}
+			sg_set_page(sg, plain_pages[page_i], left_on_page,
+				    page_ofs);
+			page_i++;
+			page_ofs = 0;
+			seg_ofs += left_on_page;
+			pkt_left -= left_on_page;
+			if (pkt_left == 0 ||
+			    (seg_ofs == seg_len &&
+			     i == msg->msg_iter.nr_segs - 1)) {
+				sg_mark_end(sg);
+				pkt_i++;
+				sg++;
+				sg_pkts[pkt_i] = sg;
+				pkt_left = pkt_size;
+			} else {
+				sg++;
+			}
+		}
+	}
+
+	if (pkt_left && pkt_left != pkt_size) {
+		pkt_i++;
+		sg_mark_end(sg);
+	}
+	ret = pkt_i;
+
+out:
+	return ret;
+}
+
+/* sg_alloc: allocated zeroed array of scatterlists
+ * cipher_page: preallocated compound page
+ */
+static int quic_sg_cipher_from_pkts(const size_t cipher_tag_size,
+				    const size_t plain_pkt_size,
+				    const size_t plain_size,
+				    struct page * const cipher_page,
+				    struct scatterlist * const sg_alloc,
+				    const size_t nr_sg_alloc,
+				    struct scatterlist ** const sg_cipher)
+{
+	const size_t cipher_pkt_size = plain_pkt_size + cipher_tag_size;
+	size_t pkts = DIV_ROUND_UP(plain_size, plain_pkt_size);
+	struct scatterlist *sg = sg_alloc;
+	int pkt_i;
+	void *ptr;
+
+	if (pkts > nr_sg_alloc)
+		return -EINVAL;
+
+	ptr = page_address(cipher_page);
+	for (pkt_i = 0; pkt_i < pkts;
+		++pkt_i, ptr += cipher_pkt_size, ++sg) {
+		sg_set_buf(sg, ptr, cipher_pkt_size);
+		sg_mark_end(sg);
+		sg_cipher[pkt_i] = sg;
+	}
+	return pkts;
+}
+
+/* fast copy from scatterlist to a buffer assuming that all pages are
+ * available in kernel memory.
+ */
+static int quic_sg_pcopy_to_buffer_kernel(struct scatterlist *sg,
+					  u8 *buffer,
+					  size_t bytes_to_copy,
+					  off_t offset_to_read)
+{
+	off_t sg_remain = sg->length;
+	size_t to_copy;
+
+	if (!bytes_to_copy)
+		return 0;
+
+	/* skip to offset first */
+	while (offset_to_read > 0) {
+		if (!sg_remain)
+			return -EINVAL;
+		if (offset_to_read < sg_remain) {
+			sg_remain -= offset_to_read;
+			break;
+		}
+		offset_to_read -= sg_remain;
+		sg = sg_next(sg);
+		if (!sg)
+			return -EINVAL;
+		sg_remain = sg->length;
+	}
+
+	/* traverse sg list from offset to offset + bytes_to_copy */
+	while (bytes_to_copy) {
+		to_copy = min_t(size_t, bytes_to_copy, sg_remain);
+		if (!to_copy)
+			return -EINVAL;
+		memcpy(buffer, sg_virt(sg) + (sg->length - sg_remain), to_copy);
+		buffer += to_copy;
+		bytes_to_copy -= to_copy;
+		if (bytes_to_copy) {
+			sg = sg_next(sg);
+			if (!sg)
+				return -EINVAL;
+			sg_remain = sg->length;
+		}
+	}
+
+	return 0;
+}
+
+static int quic_copy_header(struct scatterlist *sg_plain,
+			    u8 *buf, const size_t buf_len,
+			    const size_t conn_id_len)
+{
+	u8 *pkt = sg_virt(sg_plain);
+	size_t hdr_len;
+
+	hdr_len = 1 + conn_id_len + ((*pkt & 0x03) + 1);
+	if (hdr_len > QUIC_MAX_SHORT_HEADER_SIZE || hdr_len > buf_len)
+		return -EINVAL;
+
+	WARN_ON_ONCE(quic_sg_pcopy_to_buffer_kernel(sg_plain, buf, hdr_len, 0));
+	return hdr_len;
+}
+
+static u64 quic_unpack_pkt_num(struct quic_tx_ancillary_data * const control,
+			       const u8 * const hdr,
+			       const off_t payload_crypto_off)
+{
+	u64 truncated_pn = 0;
+	u64 candidate_pn;
+	u64 expected_pn;
+	u64 pn_hwin;
+	u64 pn_mask;
+	u64 pn_len;
+	u64 pn_win;
+	int i;
+
+	pn_len = (hdr[0] & 0x03) + 1;
+	expected_pn = control->next_pkt_num;
+
+	for (i = 1 + control->dst_conn_id_length; i < payload_crypto_off; ++i) {
+		truncated_pn <<= 8;
+		truncated_pn |= hdr[i];
+	}
+
+	pn_win = 1ULL << (pn_len << 3);
+	pn_hwin = pn_win >> 1;
+	pn_mask = pn_win - 1;
+	candidate_pn = (expected_pn & ~pn_mask) | truncated_pn;
+
+	if (expected_pn > pn_hwin &&
+	    candidate_pn <= expected_pn - pn_hwin &&
+	    candidate_pn < (1ULL << 62) - pn_win)
+		return candidate_pn + pn_win;
+
+	if (candidate_pn > expected_pn + pn_hwin &&
+	    candidate_pn >= pn_win)
+		return candidate_pn - pn_win;
+
+	return candidate_pn;
+}
+
+static int
+quic_construct_header_prot_mask(struct quic_internal_crypto_context *crypto_ctx,
+				struct skcipher_request *hdr_mask_req,
+				struct scatterlist *sg_cipher_pkt,
+				off_t sample_offset,
+				u8 *hdr_mask)
+{
+	u8 *sample = sg_virt(sg_cipher_pkt) + sample_offset;
+	u8 hdr_ctr[sizeof(u32) + QUIC_CIPHER_MAX_IV_SIZE];
+	u8 chacha20_zeros[5] = {0, 0, 0, 0, 0};
+	struct scatterlist sg_cipher_sample;
+	struct scatterlist sg_hdr_mask;
+	struct crypto_wait wait_header;
+	__le32	counter;
+
+	BUILD_BUG_ON(QUIC_HDR_MASK_SIZE
+		     < sizeof(u32) + QUIC_CIPHER_MAX_IV_SIZE);
+
+	sg_init_one(&sg_hdr_mask, hdr_mask, QUIC_HDR_MASK_SIZE);
+	skcipher_request_set_callback(hdr_mask_req, 0, crypto_req_done,
+				      &wait_header);
+
+	if (crypto_ctx->conn_info.cipher_type == TLS_CIPHER_CHACHA20_POLY1305) {
+		sg_init_one(&sg_cipher_sample, (u8 *)chacha20_zeros,
+			    sizeof(chacha20_zeros));
+		counter = cpu_to_le32(*((u32 *)sample));
+		memset(hdr_ctr, 0, sizeof(hdr_ctr));
+		memcpy((u8 *)hdr_ctr, (u8 *)&counter, sizeof(u32));
+		memcpy((u8 *)hdr_ctr + sizeof(u32),
+		       (sample + sizeof(u32)),
+		       QUIC_CIPHER_MAX_IV_SIZE);
+		skcipher_request_set_crypt(hdr_mask_req, &sg_cipher_sample,
+					   &sg_hdr_mask, 5, hdr_ctr);
+	} else {
+		/* cipher pages are continuous, get the pointer to the sg data
+		   directly, pages are allocated in kernel */
+		sg_init_one(&sg_cipher_sample, sample, QUIC_HDR_MASK_SIZE);
+		skcipher_request_set_crypt(hdr_mask_req, &sg_cipher_sample,
+					   &sg_hdr_mask, QUIC_HDR_MASK_SIZE,
+					   NULL);
+	}
+
+	return crypto_wait_req(crypto_skcipher_encrypt(hdr_mask_req),
+			       &wait_header);
+}
+
+static int quic_protect_header(struct quic_internal_crypto_context *crypto_ctx,
+			       struct quic_tx_ancillary_data *control,
+			       struct skcipher_request *hdr_mask_req,
+			       struct scatterlist *sg_cipher_pkt,
+			       int payload_crypto_off)
+{
+	u8 hdr_mask[QUIC_HDR_MASK_SIZE];
+	off_t quic_pkt_num_off;
+	u8 quic_pkt_num_len;
+	u8 *cipher_hdr;
+	int err;
+	int i;
+
+	quic_pkt_num_off = 1 + control->dst_conn_id_length;
+	quic_pkt_num_len = payload_crypto_off - quic_pkt_num_off;
+
+	if (quic_pkt_num_len > 4)
+		return -EPERM;
+
+	err = quic_construct_header_prot_mask(crypto_ctx, hdr_mask_req,
+					      sg_cipher_pkt,
+					      payload_crypto_off +
+					      (4 - quic_pkt_num_len),
+					      hdr_mask);
+	if (unlikely(err))
+		return err;
+
+	cipher_hdr = sg_virt(sg_cipher_pkt);
+	/* protect the public flags */
+	cipher_hdr[0] ^= (hdr_mask[0] & 0x1f);
+
+	for (i = 0; i < quic_pkt_num_len; ++i)
+		cipher_hdr[quic_pkt_num_off + i] ^= hdr_mask[1 + i];
+
+	return 0;
+}
+
+static
+void quic_construct_ietf_nonce(u8 *nonce,
+			       struct quic_internal_crypto_context *crypto_ctx,
+			       u64 quic_pkt_num)
+{
+	u8 *iv = quic_payload_iv(crypto_ctx);
+	int i;
+
+	for (i = quic_crypto_nonce_size(crypto_ctx->conn_info.cipher_type) - 1;
+	     i >= 0 && quic_pkt_num;
+	     --i, quic_pkt_num >>= 8)
+		nonce[i] = iv[i] ^ (u8)quic_pkt_num;
+
+	for (; i >= 0; --i)
+		nonce[i] = iv[i];
+}
+
+static ssize_t quic_sendpage(struct quic_context *ctx,
+			     struct sock *sk,
+			     struct msghdr *msg,
+			     const size_t cipher_size,
+			     struct page * const cipher_page)
+{
+	struct kvec iov;
+	ssize_t ret;
+
+	iov.iov_base = page_address(cipher_page);
+	iov.iov_len = cipher_size;
+	iov_iter_kvec(&msg->msg_iter, WRITE, &iov, 1, cipher_size);
+	ret = security_socket_sendmsg(sk->sk_socket, msg, msg_data_left(msg));
+	if (ret)
+		return ret;
+
+	ret = ctx->sk_proto->sendmsg(sk, msg, msg_data_left(msg));
+	WARN_ON(ret == -EIOCBQUEUED);
+	return ret;
+}
+
+static int quic_extract_dst_address_info(struct sock *sk, struct msghdr *msg,
+					 sa_family_t *sa_family, void **daddr,
+					  __be16 *dport)
+{
+	DECLARE_SOCKADDR(struct sockaddr_in6 *, usin6, msg->msg_name);
+	DECLARE_SOCKADDR(struct sockaddr_in *, usin, msg->msg_name);
+	struct inet_sock *inet = inet_sk(sk);
+	struct ipv6_pinfo *np = inet6_sk(sk);
+
+	if (usin6) {
+		/* dst address is provided in msg */
+		*sa_family = usin6->sin6_family;
+		switch (*sa_family) {
+		case AF_INET:
+			if (msg->msg_namelen < sizeof(*usin))
+				return -EINVAL;
+			*daddr = &usin->sin_addr.s_addr;
+			*dport = usin->sin_port;
+			break;
+		case AF_INET6:
+			if (msg->msg_namelen < sizeof(*usin6))
+				return -EINVAL;
+			*daddr = &usin6->sin6_addr;
+			*dport = usin6->sin6_port;
+			break;
+		default:
+			return -EAFNOSUPPORT;
+		}
+	} else {
+		/* socket should be connected */
+		if (sk->sk_state != TCP_ESTABLISHED)
+			return -EDESTADDRREQ;
+		if (np) {
+			*sa_family = AF_INET6;
+			*daddr = &sk->sk_v6_daddr;
+			*dport = inet->inet_dport;
+		} else if (inet) {
+			*sa_family = AF_INET;
+			*daddr = &sk->sk_daddr;
+			*dport = inet->inet_dport;
+		} else {
+			return -EAFNOSUPPORT;
+		}
+	}
+
+	if (!*dport || !*daddr)
+		return -EINVAL;
+
+	if (*sa_family == AF_INET6 &&
+	    ipv6_addr_v4mapped((struct in6_addr *)(*daddr))) {
+		*daddr = &((struct in6_addr *)(*daddr))->s6_addr32[3];
+		*sa_family = AF_INET;
+	}
+
+	return 0;
+}
+
+static int quic_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
+{
+	struct quic_internal_crypto_context *crypto_ctx = NULL;
+	struct scatterlist *sg_cipher_pkts[QUIC_MAX_GSO_FRAGS];
+	struct scatterlist *sg_plain_pkts[QUIC_MAX_GSO_FRAGS];
+	struct page *plain_pages[QUIC_MAX_PLAIN_PAGES];
+	void *plain_base_ptrs[QUIC_MAX_IOVEC_SEGMENTS];
+	void *plain_data_ptrs[QUIC_MAX_IOVEC_SEGMENTS];
+	struct msghdr msg_cipher = {
+		.msg_name = msg->msg_name,
+		.msg_namelen = msg->msg_namelen,
+		.msg_flags = msg->msg_flags,
+		.msg_control = msg->msg_control,
+		.msg_controllen = msg->msg_controllen,
+	};
+	struct quic_connection_rhash *connhash = NULL;
+	struct quic_context *ctx = quic_get_ctx(sk);
+	u8 hdr_buf[QUIC_MAX_SHORT_HEADER_SIZE];
+	struct skcipher_request *hdr_mask_req;
+	struct quic_tx_ancillary_data control;
+	struct	aead_request *aead_req = NULL;
+	u8 nonce[QUIC_CIPHER_MAX_NONCE_SIZE];
+	struct scatterlist *sg_cipher = NULL;
+	struct udp_sock *up = udp_sk(sk);
+	struct scatterlist *sg_plain = NULL;
+	u16 gso_pkt_size = up->gso_size;
+	size_t last_plain_pkt_size = 0;
+	off_t	payload_crypto_offset;
+	struct crypto_aead *tfm = NULL;
+	size_t nr_plain_pages = 0;
+	struct crypto_wait waiter;
+	size_t nr_sg_cipher_pkts;
+	size_t nr_sg_plain_pkts;
+	u8 conn_payload_key_gen;
+	ssize_t hdr_buf_len = 0;
+	size_t nr_sg_alloc = 0;
+	size_t plain_pkt_size;
+	sa_family_t sa_family;
+	u64	full_pkt_num;
+	size_t cipher_size;
+	size_t plain_size;
+	size_t pkt_size;
+	size_t tag_size;
+	__be16 dport;
+	int ret = 0;
+	void *daddr;
+	int pkt_i;
+	int err;
+
+	memset(&hdr_buf[0], 0, QUIC_MAX_SHORT_HEADER_SIZE);
+	hdr_buf_len = copy_from_iter(hdr_buf, QUIC_MAX_SHORT_HEADER_SIZE,
+				     &msg->msg_iter);
+	if (hdr_buf_len <= 0) {
+		ret = -EINVAL;
+		goto out;
+	}
+	iov_iter_revert(&msg->msg_iter, hdr_buf_len);
+
+	ctx = quic_get_ctx(sk);
+
+	// Bypass for anything that is guaranteed not QUIC.
+	plain_size = len;
+
+	if (plain_size < 2)
+		return ctx->sk_proto->sendmsg(sk, msg, len);
+
+	// Bypass for other than short header.
+	if ((hdr_buf[0] & 0xc0) != 0x40)
+		return ctx->sk_proto->sendmsg(sk, msg, len);
+
+	// Crypto adds a tag after the packet. Corking a payload would produce
+	// a crypto tag after each portion. Use GSO instead.
+	if ((msg->msg_flags & MSG_MORE) || up->pending) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = quic_sendmsg_validate(msg);
+	if (ret)
+		goto out;
+
+	ret = quic_extract_ancillary_data(msg, &control, &gso_pkt_size);
+	if (ret)
+		goto out;
+
+	// Reserved bits with ancillary data present are an error.
+	if (control.flags & ~QUIC_ANCILLARY_FLAGS) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	// Bypass offload on request. First packet bypass applies to all
+	// packets in the GSO pack.
+	if (control.flags & QUIC_BYPASS_ENCRYPTION)
+		return ctx->sk_proto->sendmsg(sk, msg, len);
+
+	if (hdr_buf_len < 1 + control.dst_conn_id_length) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	conn_payload_key_gen = (hdr_buf[0] & 0x04) >> 2;
+
+	ret = quic_extract_dst_address_info(sk, msg, &sa_family, &daddr,
+					    &dport);
+	if (ret)
+		goto out;
+
+	// Fetch the flow
+	connhash = quic_lookup_connection(ctx, &hdr_buf[1], &control,
+					  sa_family, daddr, dport);
+	if (!connhash) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	crypto_ctx = &connhash->crypto_ctx;
+	tag_size = quic_crypto_tag_size(crypto_ctx->conn_info.cipher_type);
+
+	if (crypto_ctx->conn_info.conn_payload_key_gen !=
+	    conn_payload_key_gen) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	// For GSO, use the GSO size minus cipher tag size as the packet size;
+	// for non-GSO, use the size of the whole plaintext.
+	// Reduce the packet size by tag size to keep the original packet size
+	// for the rest of the UDP path in the stack.
+	if (!gso_pkt_size) {
+		plain_pkt_size = plain_size;
+	} else {
+		if (gso_pkt_size < tag_size)
+			goto out;
+
+		plain_pkt_size = gso_pkt_size - tag_size;
+	}
+
+	// Build scatterlist from the input data, split by GSO minus the
+	// crypto tag size.
+	nr_sg_alloc = quic_sg_capacity_from_msg(plain_pkt_size,
+						msg->msg_iter.iov_offset,
+						plain_size);
+	if ((nr_sg_alloc * 2) >= QUIC_MAX_SG_ALLOC_ELEMENTS) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	sg_plain = ctx->sg_alloc;
+	sg_cipher = sg_plain + nr_sg_alloc;
+
+	ret = quic_sg_plain_from_mapped_msg(msg, plain_pages,
+					    plain_base_ptrs,
+					    plain_data_ptrs, plain_size,
+					    plain_pkt_size, sg_plain,
+					    nr_sg_alloc, sg_plain_pkts,
+					    &nr_plain_pages);
+
+	if (ret < 0)
+		goto out;
+
+	nr_sg_plain_pkts = ret;
+	last_plain_pkt_size = plain_size % plain_pkt_size;
+	if (!last_plain_pkt_size)
+		last_plain_pkt_size = plain_pkt_size;
+
+	// Build scatterlist for the ciphertext, split by GSO.
+	cipher_size = plain_size + nr_sg_plain_pkts * tag_size;
+
+	if (DIV_ROUND_UP(cipher_size, PAGE_SIZE)
+	    >= (1 << QUIC_MAX_CIPHER_PAGES_ORDER)) {
+		ret = -ENOMEM;
+		goto out_put_pages;
+	}
+
+	ret = quic_sg_cipher_from_pkts(tag_size, plain_pkt_size, plain_size,
+				       ctx->cipher_page, sg_cipher, nr_sg_alloc,
+				       sg_cipher_pkts);
+	if (ret < 0)
+		goto out_put_pages;
+
+	nr_sg_cipher_pkts = ret;
+
+	if (nr_sg_plain_pkts != nr_sg_cipher_pkts) {
+		ret = -EPERM;
+		goto out_put_pages;
+	}
+
+	// Encrypt and protect header for each packet individually.
+	tfm = crypto_ctx->packet_aead;
+	crypto_aead_clear_flags(tfm, ~0);
+	aead_req = aead_request_alloc(tfm, GFP_KERNEL);
+	if (!aead_req) {
+		aead_request_free(aead_req);
+		ret = -ENOMEM;
+		goto out_put_pages;
+	}
+
+	hdr_mask_req = skcipher_request_alloc(crypto_ctx->header_tfm,
+					      GFP_KERNEL);
+	if (!hdr_mask_req) {
+		aead_request_free(aead_req);
+		ret = -ENOMEM;
+		goto out_put_pages;
+	}
+
+	for (pkt_i = 0; pkt_i < nr_sg_plain_pkts; ++pkt_i) {
+		payload_crypto_offset =
+			quic_copy_header(sg_plain_pkts[pkt_i],
+					 hdr_buf,
+					 sizeof(hdr_buf),
+					 control.dst_conn_id_length);
+
+		full_pkt_num = quic_unpack_pkt_num(&control, hdr_buf,
+						   payload_crypto_offset);
+
+		pkt_size = (pkt_i + 1 < nr_sg_plain_pkts
+				? plain_pkt_size
+				: last_plain_pkt_size)
+			    - payload_crypto_offset;
+		if (pkt_size < 0) {
+			aead_request_free(aead_req);
+			skcipher_request_free(hdr_mask_req);
+			ret = -EINVAL;
+			goto out_put_pages;
+		}
+
+		/* Construct nonce and initialize request */
+		quic_construct_ietf_nonce(nonce, crypto_ctx, full_pkt_num);
+
+		/* Encrypt the body */
+		aead_request_set_callback(aead_req,
+					  CRYPTO_TFM_REQ_MAY_BACKLOG
+					  | CRYPTO_TFM_REQ_MAY_SLEEP,
+					  crypto_req_done, &waiter);
+		aead_request_set_crypt(aead_req, sg_plain_pkts[pkt_i],
+				       sg_cipher_pkts[pkt_i],
+				       pkt_size,
+				       nonce);
+		aead_request_set_ad(aead_req, payload_crypto_offset);
+		err = crypto_wait_req(crypto_aead_encrypt(aead_req), &waiter);
+		if (unlikely(err)) {
+			ret = err;
+			aead_request_free(aead_req);
+			skcipher_request_free(hdr_mask_req);
+			goto out_put_pages;
+		}
+
+		/* Protect the header */
+		memcpy(sg_virt(sg_cipher_pkts[pkt_i]), hdr_buf,
+		       payload_crypto_offset);
+
+		err = quic_protect_header(crypto_ctx, &control,
+					  hdr_mask_req,
+					  sg_cipher_pkts[pkt_i],
+					  payload_crypto_offset);
+		if (unlikely(err)) {
+			ret = err;
+			aead_request_free(aead_req);
+			skcipher_request_free(hdr_mask_req);
+			goto out_put_pages;
+		}
+	}
+	skcipher_request_free(hdr_mask_req);
+	aead_request_free(aead_req);
+
+	// Deliver to the next layer.
+	if (ctx->sk_proto->sendpage) {
+		msg_cipher.msg_flags |= MSG_MORE;
+		err = ctx->sk_proto->sendmsg(sk, &msg_cipher, 0);
+		if (err < 0) {
+			ret = err;
+			goto out_put_pages;
+		}
+
+		err = ctx->sk_proto->sendpage(sk, ctx->cipher_page, 0,
+					      cipher_size, 0);
+		if (err < 0) {
+			ret = err;
+			goto out_put_pages;
+		}
+		if (err != cipher_size) {
+			ret = -EINVAL;
+			goto out_put_pages;
+		}
+		ret = plain_size;
+	} else {
+		ret = quic_sendpage(ctx, sk, &msg_cipher, cipher_size,
+				    ctx->cipher_page);
+		// indicate full plaintext transmission to the caller.
+		if (ret > 0)
+			ret = plain_size;
+	}
+
+out_put_pages:
+	quic_put_plain_user_pages(plain_pages, nr_plain_pages);
+
+out:
+	return ret;
+}
+
+static int quic_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t len)
+{
+	struct quic_context *ctx;
+	int ret;
+
+	rcu_read_lock();
+	ctx = quic_get_ctx(sk);
+	rcu_read_unlock();
+	if (!ctx)
+		return -EINVAL;
+
+	mutex_lock(&ctx->sendmsg_mux);
+	ret = quic_sendmsg(sk, msg, len);
+	mutex_unlock(&ctx->sendmsg_mux);
+	return ret;
+}
+
+static void quic_release_resources(struct sock *sk)
+{
+	struct quic_internal_crypto_context *crypto_ctx;
+	struct quic_connection_rhash *connhash;
+	struct inet_sock *inet = inet_sk(sk);
+	struct rhashtable_iter hti;
+	struct quic_context *ctx;
+	struct proto *sk_proto;
+
+	rcu_read_lock();
+	ctx = quic_get_ctx(sk);
+	if (!ctx) {
+		rcu_read_unlock();
+		return;
+	}
+
+	sk_proto = ctx->sk_proto;
+
+	rhashtable_walk_enter(&ctx->tx_connections, &hti);
+	rhashtable_walk_start(&hti);
+
+	while ((connhash = rhashtable_walk_next(&hti))) {
+		if (IS_ERR(connhash)) {
+			if (PTR_ERR(connhash) == -EAGAIN)
+				continue;
+			break;
+		}
+
+		crypto_ctx = &connhash->crypto_ctx;
+		crypto_free_aead(crypto_ctx->packet_aead);
+		crypto_free_skcipher(crypto_ctx->header_tfm);
+		memzero_explicit(crypto_ctx, sizeof(*crypto_ctx));
+	}
+
+	rhashtable_walk_stop(&hti);
+	rhashtable_walk_exit(&hti);
+	rhashtable_destroy(&ctx->tx_connections);
+
+	if (ctx->cipher_page) {
+		quic_free_cipher_page(ctx->cipher_page);
+		ctx->cipher_page = NULL;
+	}
+
+	rcu_read_unlock();
+
+	write_lock_bh(&sk->sk_callback_lock);
+	rcu_assign_pointer(inet->ulp_data, NULL);
+	WRITE_ONCE(sk->sk_prot, sk_proto);
+	write_unlock_bh(&sk->sk_callback_lock);
+
+	kfree_rcu(ctx, rcu);
+}
+
+static void
+quic_prep_protos(unsigned int af, struct proto *proto, const struct proto *base)
+{
+	if (likely(test_bit(af, &af_init_done)))
+		return;
+
+	spin_lock(&quic_proto_lock);
+	if (test_bit(af, &af_init_done))
+		goto out_unlock;
+
+	*proto			= *base;
+	proto->setsockopt	= quic_setsockopt;
+	proto->getsockopt	= quic_getsockopt;
+	proto->sendmsg		= quic_sendmsg_locked;
+
+	smp_mb__before_atomic(); /* proto calls should be visible first */
+	set_bit(af, &af_init_done);
+
+out_unlock:
+	spin_unlock(&quic_proto_lock);
+}
+
+static void quic_update_proto(struct sock *sk, struct quic_context *ctx)
+{
+	struct proto *udp_proto, *quic_proto;
+	struct inet_sock *inet = inet_sk(sk);
+
+	udp_proto = READ_ONCE(sk->sk_prot);
+	ctx->sk_proto = udp_proto;
+	quic_proto = sk->sk_family == AF_INET ? &quic_v4_proto : &quic_v6_proto;
+
+	quic_prep_protos(sk->sk_family, quic_proto, udp_proto);
+
+	write_lock_bh(&sk->sk_callback_lock);
+	rcu_assign_pointer(inet->ulp_data, ctx);
+	WRITE_ONCE(sk->sk_prot, quic_proto);
+	write_unlock_bh(&sk->sk_callback_lock);
+}
+
+static int quic_init(struct sock *sk)
+{
+	struct quic_context *ctx;
+
+	ctx = quic_ctx_create();
+	if (!ctx)
+		return -ENOMEM;
+
+	quic_update_proto(sk, ctx);
+
+	return 0;
+}
+
+static void quic_release(struct sock *sk)
+{
+	lock_sock(sk);
+	quic_release_resources(sk);
+	release_sock(sk);
+}
+
+static struct udp_ulp_ops quic_ulp_ops __read_mostly = {
+	.name		= "quic-crypto",
+	.owner		= THIS_MODULE,
+	.init		= quic_init,
+	.release	= quic_release,
+};
+
+static int __init quic_register(void)
+{
+	udp_register_ulp(&quic_ulp_ops);
+	return 0;
+}
+
+static void __exit quic_unregister(void)
+{
+	udp_unregister_ulp(&quic_ulp_ops);
+}
+
+module_init(quic_register);
+module_exit(quic_unregister);
+
+MODULE_DESCRIPTION("QUIC crypto ULP");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_UDP_ULP("quic-crypto");
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [net-next v3 5/6] net: Add flow counters and Tx processing error counter
  2022-09-07  0:49 ` [net-next v3 0/6] net: support QUIC crypto Adel Abouchaev
                     ` (3 preceding siblings ...)
  2022-09-07  0:49   ` [net-next v3 4/6] net: Implement QUIC offload functions Adel Abouchaev
@ 2022-09-07  0:49   ` Adel Abouchaev
  2022-09-07  0:49   ` [net-next v3 6/6] net: Add self tests for ULP operations, flow setup and crypto tests Adel Abouchaev
  5 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-09-07  0:49 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, corbet, dsahern, shuah, netdev,
	linux-doc, linux-kselftest

Added flow counters. Total flow counter is accumulative, the current shows
the number of flows currently in flight, the error counters is accumulating
the number of errors during Tx processing.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>

---

Updated enum bracket to follow enum keyword. Removed extra blank lines.
---
 include/net/netns/mib.h   |  3 +++
 include/net/quic.h        | 10 +++++++++
 include/net/snmp.h        |  6 +++++
 include/uapi/linux/snmp.h |  9 ++++++++
 net/quic/Makefile         |  2 +-
 net/quic/quic_main.c      | 46 +++++++++++++++++++++++++++++++++++++++
 net/quic/quic_proc.c      | 45 ++++++++++++++++++++++++++++++++++++++
 7 files changed, 120 insertions(+), 1 deletion(-)
 create mode 100644 net/quic/quic_proc.c

diff --git a/include/net/netns/mib.h b/include/net/netns/mib.h
index 7e373664b1e7..dcbba3d1ceec 100644
--- a/include/net/netns/mib.h
+++ b/include/net/netns/mib.h
@@ -24,6 +24,9 @@ struct netns_mib {
 #if IS_ENABLED(CONFIG_TLS)
 	DEFINE_SNMP_STAT(struct linux_tls_mib, tls_statistics);
 #endif
+#if IS_ENABLED(CONFIG_QUIC)
+	DEFINE_SNMP_STAT(struct linux_quic_mib, quic_statistics);
+#endif
 #ifdef CONFIG_MPTCP
 	DEFINE_SNMP_STAT(struct mptcp_mib, mptcp_statistics);
 #endif
diff --git a/include/net/quic.h b/include/net/quic.h
index cafe01174e60..6362d827d266 100644
--- a/include/net/quic.h
+++ b/include/net/quic.h
@@ -25,6 +25,16 @@
 #define QUIC_MAX_PLAIN_PAGES		16
 #define QUIC_MAX_CIPHER_PAGES_ORDER	4
 
+#define __QUIC_INC_STATS(net, field)				\
+	__SNMP_INC_STATS((net)->mib.quic_statistics, field)
+#define QUIC_INC_STATS(net, field)				\
+	SNMP_INC_STATS((net)->mib.quic_statistics, field)
+#define QUIC_DEC_STATS(net, field)				\
+	SNMP_DEC_STATS((net)->mib.quic_statistics, field)
+
+int __net_init quic_proc_init(struct net *net);
+void __net_exit quic_proc_fini(struct net *net);
+
 struct quic_internal_crypto_context {
 	struct quic_connection_info	conn_info;
 	struct crypto_skcipher		*header_tfm;
diff --git a/include/net/snmp.h b/include/net/snmp.h
index 468a67836e2f..f94680a3e9e8 100644
--- a/include/net/snmp.h
+++ b/include/net/snmp.h
@@ -117,6 +117,12 @@ struct linux_tls_mib {
 	unsigned long	mibs[LINUX_MIB_TLSMAX];
 };
 
+/* Linux QUIC */
+#define LINUX_MIB_QUICMAX	__LINUX_MIB_QUICMAX
+struct linux_quic_mib {
+	unsigned long	mibs[LINUX_MIB_QUICMAX];
+};
+
 #define DEFINE_SNMP_STAT(type, name)	\
 	__typeof__(type) __percpu *name
 #define DEFINE_SNMP_STAT_ATOMIC(type, name)	\
diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index 4d7470036a8b..ca1e626dbdb4 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -349,4 +349,13 @@ enum
 	__LINUX_MIB_TLSMAX
 };
 
+/* linux QUIC mib definitions */
+enum {
+	LINUX_MIB_QUICNUM = 0,
+	LINUX_MIB_QUICCURRTXSW,			/* QuicCurrTxSw */
+	LINUX_MIB_QUICTXSW,			/* QuicTxSw */
+	LINUX_MIB_QUICTXSWERROR,		/* QuicTxSwError */
+	__LINUX_MIB_QUICMAX
+};
+
 #endif	/* _LINUX_SNMP_H */
diff --git a/net/quic/Makefile b/net/quic/Makefile
index 928239c4d08c..a885cd8bc4e0 100644
--- a/net/quic/Makefile
+++ b/net/quic/Makefile
@@ -5,4 +5,4 @@
 
 obj-$(CONFIG_QUIC) += quic.o
 
-quic-y := quic_main.o
+quic-y := quic_main.o quic_proc.o
diff --git a/net/quic/quic_main.c b/net/quic/quic_main.c
index a43d989a1c8e..1fda1083ee25 100644
--- a/net/quic/quic_main.c
+++ b/net/quic/quic_main.c
@@ -359,6 +359,8 @@ static int do_quic_conn_add_tx(struct sock *sk, sockptr_t optval,
 	if (rc < 0)
 		goto err_free_ciphers;
 
+	QUIC_INC_STATS(sock_net(sk), LINUX_MIB_QUICCURRTXSW);
+	QUIC_INC_STATS(sock_net(sk), LINUX_MIB_QUICTXSW);
 	return 0;
 
 err_free_ciphers:
@@ -416,6 +418,7 @@ static int do_quic_conn_del_tx(struct sock *sk, sockptr_t optval,
 	crypto_free_aead(crypto_ctx->packet_aead);
 	memzero_explicit(crypto_ctx, sizeof(*crypto_ctx));
 	kfree(connhash);
+	QUIC_DEC_STATS(sock_net(sk), LINUX_MIB_QUICCURRTXSW);
 
 	return 0;
 }
@@ -441,6 +444,9 @@ static int do_quic_setsockopt(struct sock *sk, int optname, sockptr_t optval,
 		break;
 	}
 
+	if (rc)
+		QUIC_INC_STATS(sock_net(sk), LINUX_MIB_QUICTXSWERROR);
+
 	return rc;
 }
 
@@ -1329,6 +1335,9 @@ static int quic_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	quic_put_plain_user_pages(plain_pages, nr_plain_pages);
 
 out:
+	if (unlikely(ret < 0))
+		QUIC_INC_STATS(sock_net(sk), LINUX_MIB_QUICTXSWERROR);
+
 	return ret;
 }
 
@@ -1461,6 +1470,36 @@ static void quic_release(struct sock *sk)
 	release_sock(sk);
 }
 
+static int __net_init quic_init_net(struct net *net)
+{
+	int err;
+
+	net->mib.quic_statistics = alloc_percpu(struct linux_quic_mib);
+	if (!net->mib.quic_statistics)
+		return -ENOMEM;
+
+	err = quic_proc_init(net);
+	if (err)
+		goto err_free_stats;
+
+	return 0;
+
+err_free_stats:
+	free_percpu(net->mib.quic_statistics);
+	return err;
+}
+
+static void __net_exit quic_exit_net(struct net *net)
+{
+	quic_proc_fini(net);
+	free_percpu(net->mib.quic_statistics);
+}
+
+static struct pernet_operations quic_proc_ops = {
+	.init = quic_init_net,
+	.exit = quic_exit_net,
+};
+
 static struct udp_ulp_ops quic_ulp_ops __read_mostly = {
 	.name		= "quic-crypto",
 	.owner		= THIS_MODULE,
@@ -1470,6 +1509,12 @@ static struct udp_ulp_ops quic_ulp_ops __read_mostly = {
 
 static int __init quic_register(void)
 {
+	int err;
+
+	err = register_pernet_subsys(&quic_proc_ops);
+	if (err)
+		return err;
+
 	udp_register_ulp(&quic_ulp_ops);
 	return 0;
 }
@@ -1477,6 +1522,7 @@ static int __init quic_register(void)
 static void __exit quic_unregister(void)
 {
 	udp_unregister_ulp(&quic_ulp_ops);
+	unregister_pernet_subsys(&quic_proc_ops);
 }
 
 module_init(quic_register);
diff --git a/net/quic/quic_proc.c b/net/quic/quic_proc.c
new file mode 100644
index 000000000000..cb4fe7a589b5
--- /dev/null
+++ b/net/quic/quic_proc.c
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+/* Copyright (C) 2019 Meta Platforms, Inc. */
+
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <net/snmp.h>
+#include <net/quic.h>
+
+#ifdef CONFIG_PROC_FS
+static const struct snmp_mib quic_mib_list[] = {
+	SNMP_MIB_ITEM("QuicCurrTxSw", LINUX_MIB_QUICCURRTXSW),
+	SNMP_MIB_ITEM("QuicTxSw", LINUX_MIB_QUICTXSW),
+	SNMP_MIB_ITEM("QuicTxSwError", LINUX_MIB_QUICTXSWERROR),
+	SNMP_MIB_SENTINEL
+};
+
+static int quic_statistics_seq_show(struct seq_file *seq, void *v)
+{
+	unsigned long buf[LINUX_MIB_QUICMAX] = {};
+	struct net *net = seq->private;
+	int i;
+
+	snmp_get_cpu_field_batch(buf, quic_mib_list, net->mib.quic_statistics);
+	for (i = 0; quic_mib_list[i].name; i++)
+		seq_printf(seq, "%-32s\t%lu\n", quic_mib_list[i].name, buf[i]);
+
+	return 0;
+}
+#endif
+
+int __net_init quic_proc_init(struct net *net)
+{
+#ifdef CONFIG_PROC_FS
+	if (!proc_create_net_single("quic_stat", 0444, net->proc_net,
+				    quic_statistics_seq_show, NULL))
+		return -ENOMEM;
+#endif /* CONFIG_PROC_FS */
+
+	return 0;
+}
+
+void __net_exit quic_proc_fini(struct net *net)
+{
+	remove_proc_entry("quic_stat", net->proc_net);
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [net-next v3 6/6] net: Add self tests for ULP operations, flow setup and crypto tests
  2022-09-07  0:49 ` [net-next v3 0/6] net: support QUIC crypto Adel Abouchaev
                     ` (4 preceding siblings ...)
  2022-09-07  0:49   ` [net-next v3 5/6] net: Add flow counters and Tx processing error counter Adel Abouchaev
@ 2022-09-07  0:49   ` Adel Abouchaev
  5 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-09-07  0:49 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, corbet, dsahern, shuah, netdev,
	linux-doc, linux-kselftest

Add self tests for ULP operations, flow setup and crypto tests.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>

---

Restored the test build. Changed the QUIC context reference variable
names for the keys and iv to match the uAPI.

Updated alignment, added SPDX license line.

v3: Added Chacha20-Poly1305 test.
v3: Added test to fail sending with wrong key generation bit.
---
 tools/testing/selftests/net/.gitignore |    1 +
 tools/testing/selftests/net/Makefile   |    3 +-
 tools/testing/selftests/net/quic.c     | 1369 ++++++++++++++++++++++++
 tools/testing/selftests/net/quic.sh    |   46 +
 4 files changed, 1418 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/net/quic.c
 create mode 100755 tools/testing/selftests/net/quic.sh

diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore
index 3d7adee7a3e6..78970a09d73c 100644
--- a/tools/testing/selftests/net/.gitignore
+++ b/tools/testing/selftests/net/.gitignore
@@ -14,6 +14,7 @@ nettest
 psock_fanout
 psock_snd
 psock_tpacket
+quic
 reuseaddr_conflict
 reuseaddr_ports_exhausted
 reuseport_addr_any
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index f5ac1433c301..b4e9586a2d03 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -44,6 +44,7 @@ TEST_PROGS += arp_ndisc_untracked_subnets.sh
 TEST_PROGS += stress_reuseport_listen.sh
 TEST_PROGS += l2_tos_ttl_inherit.sh
 TEST_PROGS += bind_bhash.sh
+TEST_PROGS += quic.sh
 TEST_PROGS_EXTENDED := in_netns.sh setup_loopback.sh setup_veth.sh
 TEST_PROGS_EXTENDED += toeplitz_client.sh toeplitz.sh
 TEST_GEN_FILES =  socket nettest
@@ -59,7 +60,7 @@ TEST_GEN_FILES += ipsec
 TEST_GEN_FILES += ioam6_parser
 TEST_GEN_FILES += gro
 TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
-TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls tun tap
+TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls tun tap quic
 TEST_GEN_FILES += toeplitz
 TEST_GEN_FILES += cmsg_sender
 TEST_GEN_FILES += stress_reuseport_listen
diff --git a/tools/testing/selftests/net/quic.c b/tools/testing/selftests/net/quic.c
new file mode 100644
index 000000000000..81285a6d9601
--- /dev/null
+++ b/tools/testing/selftests/net/quic.c
@@ -0,0 +1,1369 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include <arpa/inet.h>
+#include <errno.h>
+#include <error.h>
+#include <fcntl.h>
+#include <poll.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <linux/limits.h>
+#include <linux/quic.h>
+#include <linux/socket.h>
+#include <linux/tls.h>
+#include <linux/tcp.h>
+#include <linux/types.h>
+#include <linux/udp.h>
+
+#include <sys/ioctl.h>
+#include <sys/types.h>
+#include <sys/sendfile.h>
+#include <sys/socket.h>
+#include <sys/stat.h>
+
+#include "../kselftest_harness.h"
+
+#define UDP_ULP		105
+
+#ifndef SOL_UDP
+#define SOL_UDP		17
+#endif
+
+// 1. QUIC ULP Registration Test
+
+FIXTURE(quic_ulp)
+{
+	int sfd;
+	socklen_t len_s;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} server;
+	int default_net_ns_fd;
+	int server_net_ns_fd;
+};
+
+FIXTURE_VARIANT(quic_ulp)
+{
+	unsigned int af_server;
+	char *server_address;
+	unsigned short server_port;
+};
+
+FIXTURE_VARIANT_ADD(quic_ulp, ipv4)
+{
+	.af_server = AF_INET,
+	.server_address = "10.0.0.2",
+	.server_port = 7101,
+};
+
+FIXTURE_VARIANT_ADD(quic_ulp, ipv6)
+{
+	.af_server = AF_INET6,
+	.server_address = "2001::2",
+	.server_port = 7102,
+};
+
+FIXTURE_SETUP(quic_ulp)
+{
+	char path[PATH_MAX];
+	int optval = 1;
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/net", getpid());
+	self->default_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->default_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns2");
+	self->server_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->server_net_ns_fd, 0);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	self->sfd = socket(variant->af_server, SOCK_DGRAM, 0);
+	ASSERT_NE(setsockopt(self->sfd, SOL_SOCKET, SO_REUSEPORT, &optval,
+			     sizeof(optval)), -1);
+	if (variant->af_server == AF_INET) {
+		self->len_s = sizeof(self->server.addr);
+		self->server.addr.sin_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr.sin_addr);
+		self->server.addr.sin_port = htons(variant->server_port);
+		ASSERT_EQ(bind(self->sfd, &self->server.addr, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr,
+				      &self->len_s), 0);
+	} else {
+		self->len_s = sizeof(self->server.addr6);
+		self->server.addr6.sin6_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr6.sin6_addr);
+		self->server.addr6.sin6_port = htons(variant->server_port);
+		ASSERT_EQ(bind(self->sfd, &self->server.addr6, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr6,
+				      &self->len_s), 0);
+	}
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+};
+
+FIXTURE_TEARDOWN(quic_ulp)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	close(self->sfd);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+};
+
+TEST_F(quic_ulp, request_nonexistent_udp_ulp)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_ULP,
+			     "nonexistent", sizeof("nonexistent")), -1);
+	// If UDP_ULP option is not present, the error would be ENOPROTOOPT.
+	ASSERT_EQ(errno, ENOENT);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+};
+
+TEST_F(quic_ulp, request_quic_crypto_udp_ulp)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_ULP,
+			     "quic-crypto", sizeof("quic-crypto")), 0);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+};
+
+// 2. QUIC Data Path Operation Tests
+
+#define DO_NOT_SETUP_FLOW 0
+#define SETUP_FLOW 1
+
+#define DO_NOT_USE_CLIENT 0
+#define USE_CLIENT 1
+
+FIXTURE(quic_data)
+{
+	int sfd, c1fd, c2fd;
+	socklen_t len_c1;
+	socklen_t len_c2;
+	socklen_t len_s;
+
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} client_1;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} client_2;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} server;
+	int default_net_ns_fd;
+	int client_1_net_ns_fd;
+	int client_2_net_ns_fd;
+	int server_net_ns_fd;
+};
+
+FIXTURE_VARIANT(quic_data)
+{
+	unsigned int af_client_1;
+	char *client_1_address;
+	unsigned short client_1_port;
+	uint8_t conn_id_1[8];
+	uint8_t conn_1_key[16];
+	uint8_t conn_1_iv[12];
+	uint8_t conn_1_hdr_key[16];
+	size_t conn_id_1_len;
+	bool setup_flow_1;
+	bool use_client_1;
+	unsigned int af_client_2;
+	char *client_2_address;
+	unsigned short client_2_port;
+	uint8_t conn_id_2[8];
+	uint8_t conn_2_key[16];
+	uint8_t conn_2_iv[12];
+	uint8_t conn_2_hdr_key[16];
+	size_t conn_id_2_len;
+	bool setup_flow_2;
+	bool use_client_2;
+	unsigned int af_server;
+	char *server_address;
+	unsigned short server_port;
+};
+
+FIXTURE_VARIANT_ADD(quic_data, ipv4)
+{
+	.af_client_1 = AF_INET,
+	.client_1_address = "10.0.0.1",
+	.client_1_port = 6667,
+	.conn_id_1 = {0x11, 0x12, 0x13, 0x14},
+	.conn_id_1_len = 4,
+	.setup_flow_1 = SETUP_FLOW,
+	.use_client_1 = USE_CLIENT,
+	.af_client_2 = AF_INET,
+	.client_2_address = "10.0.0.3",
+	.client_2_port = 6668,
+	.conn_id_2 = {0x21, 0x22, 0x23, 0x24},
+	.conn_id_2_len = 4,
+	.setup_flow_2 = SETUP_FLOW,
+	//.use_client_2 = USE_CLIENT,
+	.af_server = AF_INET,
+	.server_address = "10.0.0.2",
+	.server_port = 6669,
+};
+
+FIXTURE_VARIANT_ADD(quic_data, ipv6_mapped_ipv4_two_conns)
+{
+	.af_client_1 = AF_INET6,
+	.client_1_address = "::ffff:10.0.0.1",
+	.client_1_port = 6670,
+	.conn_id_1 = {0x11, 0x12, 0x13, 0x14},
+	.conn_id_1_len = 4,
+	.setup_flow_1 = SETUP_FLOW,
+	.use_client_1 = USE_CLIENT,
+	.af_client_2 = AF_INET6,
+	.client_2_address = "::ffff:10.0.0.3",
+	.client_2_port = 6671,
+	.conn_id_2 = {0x21, 0x22, 0x23, 0x24},
+	.conn_id_2_len = 4,
+	.setup_flow_2 = SETUP_FLOW,
+	.use_client_2 = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "::ffff:10.0.0.2",
+	.server_port = 6672,
+};
+
+FIXTURE_VARIANT_ADD(quic_data, ipv6_mapped_ipv4_setup_ipv4_one_conn)
+{
+	.af_client_1 = AF_INET,
+	.client_1_address = "10.0.0.3",
+	.client_1_port = 6676,
+	.conn_id_1 = {0x11, 0x12, 0x13, 0x14},
+	.conn_id_1_len = 4,
+	.setup_flow_1 = SETUP_FLOW,
+	.use_client_1 = DO_NOT_USE_CLIENT,
+	.af_client_2 = AF_INET6,
+	.client_2_address = "::ffff:10.0.0.3",
+	.client_2_port = 6676,
+	.conn_id_2 = {0x11, 0x12, 0x13, 0x14},
+	.conn_id_2_len = 4,
+	.setup_flow_2 = DO_NOT_SETUP_FLOW,
+	.use_client_2 = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "::ffff:10.0.0.2",
+	.server_port = 6677,
+};
+
+FIXTURE_VARIANT_ADD(quic_data, ipv6_mapped_ipv4_setup_ipv6_one_conn)
+{
+	.af_client_1 = AF_INET6,
+	.client_1_address = "::ffff:10.0.0.3",
+	.client_1_port = 6678,
+	.conn_id_1 = {0x11, 0x12, 0x13, 0x14},
+	.setup_flow_1 = SETUP_FLOW,
+	.use_client_1 = DO_NOT_USE_CLIENT,
+	.af_client_2 = AF_INET,
+	.client_2_address = "10.0.0.3",
+	.client_2_port = 6678,
+	.conn_id_2 = {0x11, 0x12, 0x13, 0x14},
+	.setup_flow_2 = DO_NOT_SETUP_FLOW,
+	.use_client_2 = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "::ffff:10.0.0.2",
+	.server_port = 6679,
+};
+
+FIXTURE_SETUP(quic_data)
+{
+	char path[PATH_MAX];
+	int optval = 1;
+
+	if (variant->af_client_1 == AF_INET) {
+		self->len_c1 = sizeof(self->client_1.addr);
+		self->client_1.addr.sin_family = variant->af_client_1;
+		inet_pton(variant->af_client_1, variant->client_1_address,
+			  &self->client_1.addr.sin_addr);
+		self->client_1.addr.sin_port = htons(variant->client_1_port);
+	} else {
+		self->len_c1 = sizeof(self->client_1.addr6);
+		self->client_1.addr6.sin6_family = variant->af_client_1;
+		inet_pton(variant->af_client_1, variant->client_1_address,
+			  &self->client_1.addr6.sin6_addr);
+		self->client_1.addr6.sin6_port = htons(variant->client_1_port);
+	}
+
+	if (variant->af_client_2 == AF_INET) {
+		self->len_c2 = sizeof(self->client_2.addr);
+		self->client_2.addr.sin_family = variant->af_client_2;
+		inet_pton(variant->af_client_2, variant->client_2_address,
+			  &self->client_2.addr.sin_addr);
+		self->client_2.addr.sin_port = htons(variant->client_2_port);
+	} else {
+		self->len_c2 = sizeof(self->client_2.addr6);
+		self->client_2.addr6.sin6_family = variant->af_client_2;
+		inet_pton(variant->af_client_2, variant->client_2_address,
+			  &self->client_2.addr6.sin6_addr);
+		self->client_2.addr6.sin6_port = htons(variant->client_2_port);
+	}
+
+	if (variant->af_server == AF_INET) {
+		self->len_s = sizeof(self->server.addr);
+		self->server.addr.sin_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr.sin_addr);
+		self->server.addr.sin_port = htons(variant->server_port);
+	} else {
+		self->len_s = sizeof(self->server.addr6);
+		self->server.addr6.sin6_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr6.sin6_addr);
+		self->server.addr6.sin6_port = htons(variant->server_port);
+	}
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/net", getpid());
+	self->default_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->default_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns11");
+	self->client_1_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->client_1_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns12");
+	self->client_2_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->client_2_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns2");
+	self->server_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->server_net_ns_fd, 0);
+
+	if (variant->use_client_1) {
+		ASSERT_NE(setns(self->client_1_net_ns_fd, 0), -1);
+		self->c1fd = socket(variant->af_client_1, SOCK_DGRAM, 0);
+		ASSERT_NE(setsockopt(self->c1fd, SOL_SOCKET, SO_REUSEPORT,
+				     &optval, sizeof(optval)), -1);
+		if (variant->af_client_1 == AF_INET) {
+			ASSERT_EQ(bind(self->c1fd, &self->client_1.addr,
+				       self->len_c1), 0);
+			ASSERT_EQ(getsockname(self->c1fd, &self->client_1.addr,
+					      &self->len_c1), 0);
+		} else {
+			ASSERT_EQ(bind(self->c1fd, &self->client_1.addr6,
+				       self->len_c1), 0);
+			ASSERT_EQ(getsockname(self->c1fd, &self->client_1.addr6,
+					      &self->len_c1), 0);
+		}
+	}
+
+	if (variant->use_client_2) {
+		ASSERT_NE(setns(self->client_2_net_ns_fd, 0), -1);
+		self->c2fd = socket(variant->af_client_2, SOCK_DGRAM, 0);
+		ASSERT_NE(setsockopt(self->c2fd, SOL_SOCKET, SO_REUSEPORT,
+				     &optval, sizeof(optval)), -1);
+		if (variant->af_client_2 == AF_INET) {
+			ASSERT_EQ(bind(self->c2fd, &self->client_2.addr,
+				       self->len_c2), 0);
+			ASSERT_EQ(getsockname(self->c2fd, &self->client_2.addr,
+					      &self->len_c2), 0);
+		} else {
+			ASSERT_EQ(bind(self->c2fd, &self->client_2.addr6,
+				       self->len_c2), 0);
+			ASSERT_EQ(getsockname(self->c2fd, &self->client_2.addr6,
+					      &self->len_c2), 0);
+		}
+	}
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	self->sfd = socket(variant->af_server, SOCK_DGRAM, 0);
+	ASSERT_NE(setsockopt(self->sfd, SOL_SOCKET, SO_REUSEPORT, &optval,
+			     sizeof(optval)), -1);
+	if (variant->af_server == AF_INET) {
+		ASSERT_EQ(bind(self->sfd, &self->server.addr, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr,
+				      &self->len_s), 0);
+	} else {
+		ASSERT_EQ(bind(self->sfd, &self->server.addr6, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr6,
+				      &self->len_s), 0);
+	}
+
+	ASSERT_EQ(setsockopt(self->sfd, IPPROTO_UDP, UDP_ULP,
+			     "quic-crypto", sizeof("quic-crypto")), 0);
+
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+FIXTURE_TEARDOWN(quic_data)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	close(self->sfd);
+	ASSERT_NE(setns(self->client_1_net_ns_fd, 0), -1);
+	close(self->c1fd);
+	ASSERT_NE(setns(self->client_2_net_ns_fd, 0), -1);
+	close(self->c2fd);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+TEST_F(quic_data, send_fail_no_flow)
+{
+	char const *test_str = "test_read";
+	int send_len = 10;
+
+	ASSERT_EQ(strlen(test_str) + 1, send_len);
+	EXPECT_EQ(sendto(self->sfd, test_str, send_len, 0,
+			 &self->client_1.addr, self->len_c1), -1);
+};
+
+TEST_F(quic_data, fail_wrong_key_generation_bit)
+{
+	size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
+	uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)];
+	struct quic_connection_info conn_1_info;
+	struct quic_connection_info conn_2_info;
+	struct quic_tx_ancillary_data *anc_data;
+	struct cmsghdr *cmsg_hdr;
+	int frag_size = 1200;
+	struct iovec iov[2];
+	int msg_len = 4500;
+	struct msghdr msg;
+	char *test_str_1;
+	char *test_str_2;
+	char *buf_1;
+	char *buf_2;
+	int i;
+
+	test_str_1 = (char *)malloc(9000);
+	test_str_2 = (char *)malloc(9000);
+	memset(test_str_1, 0, 9000);
+	memset(test_str_2, 0, 9000);
+
+	buf_1 = (char *)malloc(10000);
+	buf_2 = (char *)malloc(10000);
+	for (i = 0; i < 9000; i += (1200 - 16)) {
+		test_str_1[i] = 0x44;
+		memcpy(&test_str_1[i + 1], &variant->conn_id_1,
+		       variant->conn_id_1_len);
+		test_str_1[i + 1 + variant->conn_id_1_len] = 0xca;
+
+		test_str_2[i] = 0x44;
+		memcpy(&test_str_2[i + 1], &variant->conn_id_2,
+		       variant->conn_id_2_len);
+		test_str_2[i + 1 + variant->conn_id_2_len] = 0xca;
+	}
+
+	// program the connection into the offload
+	conn_1_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	memset(&conn_1_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_1_info.key.dst_conn_id_length = variant->conn_id_1_len;
+	memcpy(conn_1_info.key.dst_conn_id,
+	       &variant->conn_id_1,
+	       variant->conn_id_1_len);
+	conn_1_info.conn_payload_key_gen = 0;
+
+	if (self->client_1.addr.sin_family == AF_INET) {
+		memcpy(&conn_1_info.key.addr.ipv4_addr,
+		       &self->client_1.addr.sin_addr, sizeof(struct in_addr));
+		conn_1_info.key.udp_port = self->client_1.addr.sin_port;
+	} else {
+		memcpy(&conn_1_info.key.addr.ipv6_addr,
+		       &self->client_1.addr6.sin6_addr,
+		       sizeof(struct in6_addr));
+		conn_1_info.key.udp_port = self->client_1.addr6.sin6_port;
+	}
+
+	conn_2_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	memset(&conn_2_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_2_info.key.dst_conn_id_length = variant->conn_id_2_len;
+	memcpy(conn_2_info.key.dst_conn_id,
+	       &variant->conn_id_2,
+	       variant->conn_id_2_len);
+	conn_2_info.conn_payload_key_gen = 0;
+
+	if (self->client_2.addr.sin_family == AF_INET) {
+		memcpy(&conn_2_info.key.addr.ipv4_addr,
+		       &self->client_2.addr.sin_addr, sizeof(struct in_addr));
+		conn_2_info.key.udp_port = self->client_2.addr.sin_port;
+	} else {
+		memcpy(&conn_2_info.key.addr.ipv6_addr,
+		       &self->client_2.addr6.sin6_addr,
+		       sizeof(struct in6_addr));
+		conn_2_info.key.udp_port = self->client_2.addr6.sin6_port;
+	}
+
+	memcpy(&conn_1_info.aes_gcm_128.payload_key,
+	       &variant->conn_1_key, 16);
+	memcpy(&conn_1_info.aes_gcm_128.payload_iv,
+	       &variant->conn_1_iv, 12);
+	memcpy(&conn_1_info.aes_gcm_128.header_key,
+	       &variant->conn_1_hdr_key, 16);
+	memcpy(&conn_2_info.aes_gcm_128.payload_key,
+	       &variant->conn_2_key, 16);
+	memcpy(&conn_2_info.aes_gcm_128.payload_iv,
+	       &variant->conn_2_iv, 12);
+	memcpy(&conn_2_info.aes_gcm_128.header_key,
+	       &variant->conn_2_hdr_key,
+	       16);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
+			     sizeof(frag_size)), 0);
+
+	if (variant->setup_flow_1)
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_ADD_TX_CONNECTION,
+				     &conn_1_info, sizeof(conn_1_info)), 0);
+
+	if (variant->setup_flow_2)
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_ADD_TX_CONNECTION,
+				     &conn_2_info, sizeof(conn_2_info)), 0);
+
+	iov[0].iov_base = test_str_1;
+	iov[0].iov_len = msg_len;
+	iov[1].iov_base = (void *)test_str_1 + 4500;
+	iov[1].iov_len = msg_len;
+
+	msg.msg_name = (self->client_1.addr.sin_family == AF_INET)
+		       ? (void *)&self->client_1.addr
+		       : (void *)&self->client_1.addr6;
+	msg.msg_namelen = self->len_c1;
+	msg.msg_iov = iov;
+	msg.msg_iovlen = 2;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->flags = 0;
+	anc_data->dst_conn_id_length = variant->conn_id_1_len;
+
+	if (variant->use_client_1)
+		EXPECT_EQ(sendmsg(self->sfd, &msg, 0), -1);
+
+	iov[0].iov_base = test_str_2;
+	iov[0].iov_len = msg_len;
+	iov[1].iov_base = (void *)test_str_2 + 4500;
+	iov[1].iov_len = msg_len;
+	msg.msg_name = (self->client_2.addr.sin_family == AF_INET)
+		       ? (void *)&self->client_2.addr
+		       : (void *)&self->client_2.addr6;
+	msg.msg_namelen = self->len_c2;
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->dst_conn_id_length = variant->conn_id_2_len;
+	anc_data->flags = 0;
+
+	if (variant->use_client_2)
+		EXPECT_EQ(sendmsg(self->sfd, &msg, 0), -1);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	if (variant->setup_flow_1) {
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_DEL_TX_CONNECTION,
+				     &conn_1_info, sizeof(conn_1_info)),
+			  0);
+	}
+	if (variant->setup_flow_2) {
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_DEL_TX_CONNECTION,
+				     &conn_2_info, sizeof(conn_2_info)),
+			  0);
+	}
+	free(test_str_1);
+	free(test_str_2);
+	free(buf_1);
+	free(buf_2);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+TEST_F(quic_data, encrypt_two_conn_gso_1200_iov_2_size_9000_aesgcm128)
+{
+	size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
+	uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)];
+	struct quic_connection_info conn_1_info;
+	struct quic_connection_info conn_2_info;
+	struct quic_tx_ancillary_data *anc_data;
+	socklen_t recv_addr_len_1;
+	socklen_t recv_addr_len_2;
+	struct cmsghdr *cmsg_hdr;
+	int frag_size = 1200;
+	int send_len = 9000;
+	struct iovec iov[2];
+	int msg_len = 4500;
+	struct msghdr msg;
+	char *test_str_1;
+	char *test_str_2;
+	char *buf_1;
+	char *buf_2;
+	int i;
+
+	test_str_1 = (char *)malloc(9000);
+	test_str_2 = (char *)malloc(9000);
+	memset(test_str_1, 0, 9000);
+	memset(test_str_2, 0, 9000);
+
+	buf_1 = (char *)malloc(10000);
+	buf_2 = (char *)malloc(10000);
+	for (i = 0; i < 9000; i += (1200 - 16)) {
+		test_str_1[i] = 0x40;
+		memcpy(&test_str_1[i + 1], &variant->conn_id_1,
+		       variant->conn_id_1_len);
+		test_str_1[i + 1 + variant->conn_id_1_len] = 0xca;
+
+		test_str_2[i] = 0x40;
+		memcpy(&test_str_2[i + 1], &variant->conn_id_2,
+		       variant->conn_id_2_len);
+		test_str_2[i + 1 + variant->conn_id_2_len] = 0xca;
+	}
+
+	// program the connection into the offload
+	conn_1_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	memset(&conn_1_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_1_info.key.dst_conn_id_length = variant->conn_id_1_len;
+	memcpy(conn_1_info.key.dst_conn_id,
+	       &variant->conn_id_1,
+	       variant->conn_id_1_len);
+	conn_1_info.conn_payload_key_gen = 0;
+
+	if (self->client_1.addr.sin_family == AF_INET) {
+		memcpy(&conn_1_info.key.addr.ipv4_addr,
+		       &self->client_1.addr.sin_addr, sizeof(struct in_addr));
+		conn_1_info.key.udp_port = self->client_1.addr.sin_port;
+	} else {
+		memcpy(&conn_1_info.key.addr.ipv6_addr,
+		       &self->client_1.addr6.sin6_addr,
+		       sizeof(struct in6_addr));
+		conn_1_info.key.udp_port = self->client_1.addr6.sin6_port;
+	}
+
+	conn_2_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	memset(&conn_2_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_2_info.key.dst_conn_id_length = variant->conn_id_2_len;
+	memcpy(conn_2_info.key.dst_conn_id,
+	       &variant->conn_id_2,
+	       variant->conn_id_2_len);
+	conn_2_info.conn_payload_key_gen = 0;
+
+	if (self->client_2.addr.sin_family == AF_INET) {
+		memcpy(&conn_2_info.key.addr.ipv4_addr,
+		       &self->client_2.addr.sin_addr, sizeof(struct in_addr));
+		conn_2_info.key.udp_port = self->client_2.addr.sin_port;
+	} else {
+		memcpy(&conn_2_info.key.addr.ipv6_addr,
+		       &self->client_2.addr6.sin6_addr,
+		       sizeof(struct in6_addr));
+		conn_2_info.key.udp_port = self->client_2.addr6.sin6_port;
+	}
+
+	memcpy(&conn_1_info.aes_gcm_128.payload_key,
+	       &variant->conn_1_key, 16);
+	memcpy(&conn_1_info.aes_gcm_128.payload_iv,
+	       &variant->conn_1_iv, 12);
+	memcpy(&conn_1_info.aes_gcm_128.header_key,
+	       &variant->conn_1_hdr_key, 16);
+	memcpy(&conn_2_info.aes_gcm_128.payload_key,
+	       &variant->conn_2_key, 16);
+	memcpy(&conn_2_info.aes_gcm_128.payload_iv,
+	       &variant->conn_2_iv, 12);
+	memcpy(&conn_2_info.aes_gcm_128.header_key,
+	       &variant->conn_2_hdr_key,
+	       16);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
+			     sizeof(frag_size)), 0);
+
+	if (variant->setup_flow_1)
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_ADD_TX_CONNECTION,
+				     &conn_1_info, sizeof(conn_1_info)), 0);
+
+	if (variant->setup_flow_2)
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_ADD_TX_CONNECTION,
+				     &conn_2_info, sizeof(conn_2_info)), 0);
+
+	recv_addr_len_1 = self->len_c1;
+	recv_addr_len_2 = self->len_c2;
+
+	iov[0].iov_base = test_str_1;
+	iov[0].iov_len = msg_len;
+	iov[1].iov_base = (void *)test_str_1 + 4500;
+	iov[1].iov_len = msg_len;
+
+	msg.msg_name = (self->client_1.addr.sin_family == AF_INET)
+		       ? (void *)&self->client_1.addr
+		       : (void *)&self->client_1.addr6;
+	msg.msg_namelen = self->len_c1;
+	msg.msg_iov = iov;
+	msg.msg_iovlen = 2;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->flags = 0;
+	anc_data->dst_conn_id_length = variant->conn_id_1_len;
+
+	if (variant->use_client_1)
+		EXPECT_EQ(sendmsg(self->sfd, &msg, 0), send_len);
+
+	iov[0].iov_base = test_str_2;
+	iov[0].iov_len = msg_len;
+	iov[1].iov_base = (void *)test_str_2 + 4500;
+	iov[1].iov_len = msg_len;
+	msg.msg_name = (self->client_2.addr.sin_family == AF_INET)
+		       ? (void *)&self->client_2.addr
+		       : (void *)&self->client_2.addr6;
+	msg.msg_namelen = self->len_c2;
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->dst_conn_id_length = variant->conn_id_2_len;
+	anc_data->flags = 0;
+
+	if (variant->use_client_2)
+		EXPECT_EQ(sendmsg(self->sfd, &msg, 0), send_len);
+
+	if (variant->use_client_1) {
+		ASSERT_NE(setns(self->client_1_net_ns_fd, 0), -1);
+		if (variant->af_client_1 == AF_INET) {
+			for (i = 0; i < 7; ++i) {
+				EXPECT_EQ(recvfrom(self->c1fd, buf_1, 9000, 0,
+						   &self->client_1.addr,
+						   &recv_addr_len_1),
+					  1200);
+				// Validate framing is intact.
+				EXPECT_EQ(memcmp((void *)buf_1 + 1,
+						 &variant->conn_id_1,
+						 variant->conn_id_1_len), 0);
+			}
+			EXPECT_EQ(recvfrom(self->c1fd, buf_1, 9000, 0,
+					   &self->client_1.addr,
+					   &recv_addr_len_1),
+				  728);
+			EXPECT_EQ(memcmp((void *)buf_1 + 1,
+					 &variant->conn_id_1,
+					 variant->conn_id_1_len), 0);
+		} else {
+			for (i = 0; i < 7; ++i) {
+				EXPECT_EQ(recvfrom(self->c1fd, buf_1, 9000, 0,
+						   &self->client_1.addr6,
+						   &recv_addr_len_1),
+					1200);
+			}
+			EXPECT_EQ(recvfrom(self->c1fd, buf_1, 9000, 0,
+					   &self->client_1.addr6,
+					   &recv_addr_len_1),
+				  728);
+			EXPECT_EQ(memcmp((void *)buf_1 + 1,
+					 &variant->conn_id_1,
+					 variant->conn_id_1_len), 0);
+		}
+		EXPECT_NE(memcmp(buf_1, test_str_1, send_len), 0);
+	}
+
+	if (variant->use_client_2) {
+		ASSERT_NE(setns(self->client_2_net_ns_fd, 0), -1);
+		if (variant->af_client_2 == AF_INET) {
+			for (i = 0; i < 7; ++i) {
+				EXPECT_EQ(recvfrom(self->c2fd, buf_2, 9000, 0,
+						   &self->client_2.addr,
+						   &recv_addr_len_2),
+					  1200);
+				EXPECT_EQ(memcmp((void *)buf_2 + 1,
+						 &variant->conn_id_2,
+						 variant->conn_id_2_len), 0);
+			}
+			EXPECT_EQ(recvfrom(self->c2fd, buf_2, 9000, 0,
+					   &self->client_2.addr,
+					   &recv_addr_len_2),
+				  728);
+			EXPECT_EQ(memcmp((void *)buf_2 + 1,
+					 &variant->conn_id_2,
+					 variant->conn_id_2_len), 0);
+		} else {
+			for (i = 0; i < 7; ++i) {
+				EXPECT_EQ(recvfrom(self->c2fd, buf_2, 9000, 0,
+						   &self->client_2.addr6,
+						   &recv_addr_len_2),
+					  1200);
+				EXPECT_EQ(memcmp((void *)buf_2 + 1,
+						 &variant->conn_id_2,
+						 variant->conn_id_2_len), 0);
+			}
+			EXPECT_EQ(recvfrom(self->c2fd, buf_2, 9000, 0,
+					   &self->client_2.addr6,
+					   &recv_addr_len_2),
+				  728);
+			EXPECT_EQ(memcmp((void *)buf_2 + 1,
+					 &variant->conn_id_2,
+					 variant->conn_id_2_len), 0);
+		}
+		EXPECT_NE(memcmp(buf_2, test_str_2, send_len), 0);
+	}
+
+	if (variant->use_client_1 && variant->use_client_2)
+		EXPECT_NE(memcmp(buf_1, buf_2, send_len), 0);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	if (variant->setup_flow_1) {
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_DEL_TX_CONNECTION,
+				     &conn_1_info, sizeof(conn_1_info)),
+			  0);
+	}
+	if (variant->setup_flow_2) {
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_DEL_TX_CONNECTION,
+				     &conn_2_info, sizeof(conn_2_info)),
+			  0);
+	}
+	free(test_str_1);
+	free(test_str_2);
+	free(buf_1);
+	free(buf_2);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+// 3. QUIC Encryption Tests
+
+FIXTURE(quic_crypto)
+{
+	int sfd, cfd;
+	socklen_t len_c;
+	socklen_t len_s;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} client;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} server;
+	int default_net_ns_fd;
+	int client_net_ns_fd;
+	int server_net_ns_fd;
+};
+
+FIXTURE_VARIANT(quic_crypto)
+{
+	unsigned int af_client;
+	char *client_address;
+	unsigned short client_port;
+	uint32_t algo;
+	size_t conn_key_len;
+	uint8_t conn_id[8];
+	union {
+		uint8_t conn_key_16[16];
+		uint8_t conn_key_32[32];
+	} conn_key;
+	uint8_t conn_iv[12];
+	union {
+		uint8_t conn_hdr_key_16[16];
+		uint8_t conn_hdr_key_32[32];
+	} conn_hdr_key;
+	size_t conn_id_len;
+	bool setup_flow;
+	bool use_client;
+	unsigned int af_server;
+	char *server_address;
+	unsigned short server_port;
+	char plain[128];
+	size_t plain_len;
+	char match[128];
+	size_t match_len;
+	uint32_t next_pkt_num;
+};
+
+FIXTURE_SETUP(quic_crypto)
+{
+	char path[PATH_MAX];
+	int optval = 1;
+
+	if (variant->af_client == AF_INET) {
+		self->len_c = sizeof(self->client.addr);
+		self->client.addr.sin_family = variant->af_client;
+		inet_pton(variant->af_client, variant->client_address,
+			  &self->client.addr.sin_addr);
+		self->client.addr.sin_port = htons(variant->client_port);
+	} else {
+		self->len_c = sizeof(self->client.addr6);
+		self->client.addr6.sin6_family = variant->af_client;
+		inet_pton(variant->af_client, variant->client_address,
+			  &self->client.addr6.sin6_addr);
+		self->client.addr6.sin6_port = htons(variant->client_port);
+	}
+
+	if (variant->af_server == AF_INET) {
+		self->len_s = sizeof(self->server.addr);
+		self->server.addr.sin_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr.sin_addr);
+		self->server.addr.sin_port = htons(variant->server_port);
+	} else {
+		self->len_s = sizeof(self->server.addr6);
+		self->server.addr6.sin6_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr6.sin6_addr);
+		self->server.addr6.sin6_port = htons(variant->server_port);
+	}
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/net", getpid());
+	self->default_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->default_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns11");
+	self->client_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->client_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns2");
+	self->server_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->server_net_ns_fd, 0);
+
+	if (variant->use_client) {
+		ASSERT_NE(setns(self->client_net_ns_fd, 0), -1);
+		self->cfd = socket(variant->af_client, SOCK_DGRAM, 0);
+		ASSERT_NE(setsockopt(self->cfd, SOL_SOCKET, SO_REUSEPORT,
+				     &optval, sizeof(optval)), -1);
+		if (variant->af_client == AF_INET) {
+			ASSERT_EQ(bind(self->cfd, &self->client.addr,
+				       self->len_c), 0);
+			ASSERT_EQ(getsockname(self->cfd, &self->client.addr,
+					      &self->len_c), 0);
+		} else {
+			ASSERT_EQ(bind(self->cfd, &self->client.addr6,
+				       self->len_c), 0);
+			ASSERT_EQ(getsockname(self->cfd, &self->client.addr6,
+					      &self->len_c), 0);
+		}
+	}
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	self->sfd = socket(variant->af_server, SOCK_DGRAM, 0);
+	ASSERT_NE(setsockopt(self->sfd, SOL_SOCKET, SO_REUSEPORT, &optval,
+			     sizeof(optval)), -1);
+	if (variant->af_server == AF_INET) {
+		ASSERT_EQ(bind(self->sfd, &self->server.addr, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr,
+				      &self->len_s),
+			  0);
+	} else {
+		ASSERT_EQ(bind(self->sfd, &self->server.addr6, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr6,
+				      &self->len_s),
+			  0);
+	}
+
+	ASSERT_EQ(setsockopt(self->sfd, IPPROTO_UDP, UDP_ULP,
+			     "quic-crypto", sizeof("quic-crypto")), 0);
+
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+FIXTURE_TEARDOWN(quic_crypto)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	close(self->sfd);
+	ASSERT_NE(setns(self->client_net_ns_fd, 0), -1);
+	close(self->cfd);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+FIXTURE_VARIANT_ADD(quic_crypto, ipv4_aes_gcm_128)
+{
+	.af_client = AF_INET,
+	.client_address = "10.0.0.1",
+	.client_port = 7667,
+	.algo = TLS_CIPHER_AES_GCM_128,
+	.conn_key_len = 16,
+	.conn_id = {0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12, 0x49},
+	.conn_key = {
+		.conn_key_16 = {0x87, 0x71, 0xea, 0x1d,
+				0xfb, 0xbe, 0x7a, 0x45,
+				0xbb, 0xe2, 0x7e, 0xbc,
+				0x0b, 0x53, 0x94, 0x99
+		},
+	},
+	.conn_iv = {0x3A, 0xA7, 0x46, 0x72, 0xE9, 0x83, 0x6B, 0x55, 0xDA,
+		0x66, 0x7B, 0xDA},
+	.conn_hdr_key = {
+		.conn_hdr_key_16 = {0xc9, 0x8e, 0xfd, 0xf2,
+				    0x0b, 0x64, 0x8c, 0x57,
+				    0xb5, 0x0a, 0xb2, 0xd2,
+				    0x21, 0xd3, 0x66, 0xa5},
+	},
+	.conn_id_len = 8,
+	.setup_flow = SETUP_FLOW,
+	.use_client = USE_CLIENT,
+	.af_server = AF_INET,
+	.server_address = "10.0.0.2",
+	.server_port = 7669,
+	.plain = { 0x40, 0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12,
+		   0x49, 0xca,
+		   // payload
+		   0x02, 0x80, 0xde, 0x40, 0x39, 0x40, 0xf6, 0x00,
+		   0x01, 0x0b, 0x00, 0x0f, 0x65, 0x63, 0x68, 0x6f,
+		   0x20, 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36,
+		   0x37, 0x38, 0x39
+	},
+	.plain_len = 37,
+	.match = {
+		   0x46, 0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12,
+		   0x49, 0x1c, 0x44, 0xb8, 0x41, 0xbb, 0xcf, 0x6e,
+		   0x0a, 0x2a, 0x24, 0xfb, 0xb4, 0x79, 0x62, 0xea,
+		   0x59, 0x38, 0x1a, 0x0e, 0x50, 0x1e, 0x59, 0xed,
+		   0x3f, 0x8e, 0x7e, 0x5a, 0x70, 0xe4, 0x2a, 0xbc,
+		   0x2a, 0xfa, 0x2b, 0x54, 0xeb, 0x89, 0xc3, 0x2c,
+		   0xb6, 0x8c, 0x1e, 0xab, 0x2d
+	},
+	.match_len = 53,
+	.next_pkt_num = 0x0d65c9,
+};
+
+FIXTURE_VARIANT_ADD(quic_crypto, ipv4_chacha20_poly1305)
+{
+	.af_client = AF_INET,
+	.client_address = "10.0.0.1",
+	.client_port = 7801,
+	.algo = TLS_CIPHER_CHACHA20_POLY1305,
+	.conn_key_len = 32,
+	.conn_id = {},
+	.conn_id_len = 0,
+	.conn_key = {
+		.conn_key_32 = {
+			0x3b, 0xfc, 0xdd, 0xd7, 0x2b, 0xcf, 0x02, 0x54,
+			0x1d, 0x7f, 0xa0, 0xdd, 0x1f, 0x5f, 0x9e, 0xee,
+			0xa8, 0x17, 0xe0, 0x9a, 0x69, 0x63, 0xa0, 0xe6,
+			0xc7, 0xdf, 0x0f, 0x9a, 0x1b, 0xab, 0x90, 0xf2,
+		},
+	},
+	.conn_iv = {
+		0xa6, 0xb5, 0xbc, 0x6a, 0xb7, 0xda, 0xfc, 0xe3,
+		0x0f, 0xff, 0xf5, 0xdd,
+	},
+	.conn_hdr_key = {
+		.conn_hdr_key_32 = {
+			0xd6, 0x59, 0x76, 0x0d, 0x2b, 0xa4, 0x34, 0xa2,
+			0x26, 0xfd, 0x37, 0xb3, 0x5c, 0x69, 0xe2, 0xda,
+			0x82, 0x11, 0xd1, 0x0c, 0x4f, 0x12, 0x53, 0x87,
+			0x87, 0xd6, 0x56, 0x45, 0xd5, 0xd1, 0xb8, 0xe2,
+		},
+	},
+	.setup_flow = SETUP_FLOW,
+	.use_client = USE_CLIENT,
+	.af_server = AF_INET,
+	.server_address = "10.0.0.2",
+	.server_port = 7802,
+	.plain = { 0x42, 0x00, 0xbf, 0xf4, 0x01 },
+	.plain_len = 5,
+	.match = { 0x55, 0x58, 0xb1, 0xc6, 0x0a, 0xe7, 0xb6, 0xb9,
+		   0x32, 0xbc, 0x27, 0xd7, 0x86, 0xf4, 0xbc, 0x2b,
+		   0xb2, 0x0f, 0x21, 0x62, 0xba },
+	.match_len = 21,
+	.next_pkt_num = 0x2700bff5,
+};
+
+FIXTURE_VARIANT_ADD(quic_crypto, ipv6_aes_gcm_128)
+{
+	.af_client = AF_INET6,
+	.client_address = "2001::1",
+	.client_port = 7673,
+	.algo = TLS_CIPHER_AES_GCM_128,
+	.conn_key_len = 16,
+	.conn_id = {0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12, 0x49},
+	.conn_key = {
+		.conn_key_16 = {0x87, 0x71, 0xea, 0x1d,
+				0xfb, 0xbe, 0x7a, 0x45,
+				0xbb, 0xe2, 0x7e, 0xbc,
+				0x0b, 0x53, 0x94, 0x99
+		},
+	},
+	.conn_iv = {0x3a, 0xa7, 0x46, 0x72, 0xe9, 0x83, 0x6b, 0x55, 0xda,
+		0x66, 0x7b, 0xda},
+	.conn_hdr_key = {
+		.conn_hdr_key_16 = {0xc9, 0x8e, 0xfd, 0xf2,
+				    0x0b, 0x64, 0x8c, 0x57,
+				    0xb5, 0x0a, 0xb2, 0xd2,
+				    0x21, 0xd3, 0x66, 0xa5},
+	},
+	.conn_id_len = 8,
+	.setup_flow = SETUP_FLOW,
+	.use_client = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "2001::2",
+	.server_port = 7675,
+	.plain = { 0x40, 0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12,
+		   0x49, 0xca,
+		   // Payload
+		   0x02, 0x80, 0xde, 0x40, 0x39, 0x40, 0xf6, 0x00,
+		   0x01, 0x0b, 0x00, 0x0f, 0x65, 0x63, 0x68, 0x6f,
+		   0x20, 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36,
+		   0x37, 0x38, 0x39
+	},
+	.plain_len = 37,
+	.match = {
+		   0x46, 0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12,
+		   0x49, 0x1c, 0x44, 0xb8, 0x41, 0xbb, 0xcf, 0x6e,
+		   0x0a, 0x2a, 0x24, 0xfb, 0xb4, 0x79, 0x62, 0xea,
+		   0x59, 0x38, 0x1a, 0x0e, 0x50, 0x1e, 0x59, 0xed,
+		   0x3f, 0x8e, 0x7e, 0x5a, 0x70, 0xe4, 0x2a, 0xbc,
+		   0x2a, 0xfa, 0x2b, 0x54, 0xeb, 0x89, 0xc3, 0x2c,
+		   0xb6, 0x8c, 0x1e, 0xab, 0x2d
+	},
+	.match_len = 53,
+	.next_pkt_num = 0x0d65c9,
+};
+
+FIXTURE_VARIANT_ADD(quic_crypto, ipv6_chacha20_poly1305)
+{
+	.af_client = AF_INET6,
+	.client_address = "2001::1",
+	.client_port = 7803,
+	.algo = TLS_CIPHER_CHACHA20_POLY1305,
+	.conn_key_len = 32,
+	.conn_id = {},
+	.conn_id_len = 0,
+	.conn_key = {
+		.conn_key_32 = {
+			0x3b, 0xfc, 0xdd, 0xd7, 0x2b, 0xcf, 0x02, 0x54,
+			0x1d, 0x7f, 0xa0, 0xdd, 0x1f, 0x5f, 0x9e, 0xee,
+			0xa8, 0x17, 0xe0, 0x9a, 0x69, 0x63, 0xa0, 0xe6,
+			0xc7, 0xdf, 0x0f, 0x9a, 0x1b, 0xab, 0x90, 0xf2,
+		},
+	},
+	.conn_iv = {
+		0xa6, 0xb5, 0xbc, 0x6a, 0xb7, 0xda, 0xfc, 0xe3,
+		0x0f, 0xff, 0xf5, 0xdd,
+	},
+	.conn_hdr_key = {
+		.conn_hdr_key_32 = {
+			0xd6, 0x59, 0x76, 0x0d, 0x2b, 0xa4, 0x34, 0xa2,
+			0x26, 0xfd, 0x37, 0xb3, 0x5c, 0x69, 0xe2, 0xda,
+			0x82, 0x11, 0xd1, 0x0c, 0x4f, 0x12, 0x53, 0x87,
+			0x87, 0xd6, 0x56, 0x45, 0xd5, 0xd1, 0xb8, 0xe2,
+		},
+	},
+	.setup_flow = SETUP_FLOW,
+	.use_client = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "2001::2",
+	.server_port = 7804,
+	.plain = { 0x42, 0x00, 0xbf, 0xf4, 0x01 },
+	.plain_len = 5,
+	.match = { 0x55, 0x58, 0xb1, 0xc6, 0x0a, 0xe7, 0xb6, 0xb9,
+		   0x32, 0xbc, 0x27, 0xd7, 0x86, 0xf4, 0xbc, 0x2b,
+		   0xb2, 0x0f, 0x21, 0x62, 0xba },
+	.match_len = 21,
+	.next_pkt_num = 0x2700bff5,
+};
+
+TEST_F(quic_crypto, encrypt_test_vector_single_flow_gso_in_control)
+{
+	uint8_t cmsg_buf[CMSG_SPACE(sizeof(struct quic_tx_ancillary_data))
+			 + CMSG_SPACE(sizeof(uint16_t))];
+	struct quic_tx_ancillary_data *anc_data;
+	struct quic_connection_info conn_info;
+	uint16_t frag_size = 1200;
+	struct cmsghdr *cmsg_hdr;
+	int wrong_frag_size = 26;
+	socklen_t recv_addr_len;
+	struct iovec iov;
+	struct msghdr msg;
+	char *buf;
+
+	buf = (char *)malloc(9000);
+	conn_info.cipher_type = variant->algo;
+	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_info.key.dst_conn_id_length = variant->conn_id_len;
+	memcpy(conn_info.key.dst_conn_id,
+	       &variant->conn_id,
+	       variant->conn_id_len);
+	conn_info.conn_payload_key_gen = 0;
+
+	if (self->client.addr.sin_family == AF_INET) {
+		memcpy(&conn_info.key.addr.ipv4_addr,
+		       &self->client.addr.sin_addr, sizeof(struct in_addr));
+		conn_info.key.udp_port = self->client.addr.sin_port;
+	} else {
+		memcpy(&conn_info.key.addr.ipv6_addr,
+		       &self->client.addr6.sin6_addr,
+		       sizeof(struct in6_addr));
+		conn_info.key.udp_port = self->client.addr6.sin6_port;
+	}
+
+	ASSERT_TRUE(variant->algo == TLS_CIPHER_AES_GCM_128 ||
+		    variant->algo == TLS_CIPHER_CHACHA20_POLY1305);
+	switch (variant->algo) {
+	case TLS_CIPHER_AES_GCM_128:
+		memcpy(&conn_info.aes_gcm_128.payload_key,
+		       &variant->conn_key, 16);
+		memcpy(&conn_info.aes_gcm_128.payload_iv,
+		       &variant->conn_iv, 12);
+		memcpy(&conn_info.aes_gcm_128.header_key,
+		       &variant->conn_hdr_key, 16);
+		break;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		memcpy(&conn_info.chacha20_poly1305.payload_key,
+		       &variant->conn_key, 32);
+		memcpy(&conn_info.chacha20_poly1305.payload_iv,
+		       &variant->conn_iv, 12);
+		memcpy(&conn_info.chacha20_poly1305.header_key,
+		       &variant->conn_hdr_key, 32);
+		break;
+	}
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &wrong_frag_size,
+			     sizeof(wrong_frag_size)), 0);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION,
+			     &conn_info, sizeof(conn_info)), 0);
+
+	recv_addr_len = self->len_c;
+	iov.iov_base = (void *)variant->plain;
+	iov.iov_len = variant->plain_len;
+	memset(cmsg_buf, 0, sizeof(cmsg_buf));
+	msg.msg_name = (self->client.addr.sin_family == AF_INET)
+		       ? (void *)&self->client.addr
+		       : (void *)&self->client.addr6;
+	msg.msg_namelen = self->len_c;
+	msg.msg_iov = &iov;
+	msg.msg_iovlen = 1;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(sizeof(struct quic_tx_ancillary_data));
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->flags = 0;
+	anc_data->next_pkt_num = variant->next_pkt_num;
+	anc_data->dst_conn_id_length = variant->conn_id_len;
+	cmsg_hdr = CMSG_NXTHDR(&msg, cmsg_hdr);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_SEGMENT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(sizeof(uint16_t));
+	memcpy(CMSG_DATA(cmsg_hdr), (void *)&frag_size, sizeof(frag_size));
+
+	EXPECT_EQ(sendmsg(self->sfd, &msg, 0), variant->plain_len);
+	ASSERT_NE(setns(self->client_net_ns_fd, 0), -1);
+	if (variant->af_client == AF_INET) {
+		EXPECT_EQ(recvfrom(self->cfd, buf, 9000, 0,
+				   &self->client.addr, &recv_addr_len),
+			  variant->match_len);
+	} else {
+		EXPECT_EQ(recvfrom(self->cfd, buf, 9000, 0,
+				   &self->client.addr6, &recv_addr_len),
+			  variant->match_len);
+	}
+	EXPECT_STREQ(buf, variant->match);
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION,
+			     &conn_info, sizeof(conn_info)), 0);
+	free(buf);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+TEST_F(quic_crypto, encrypt_test_vector_single_flow_gso_in_setsockopt)
+{
+	uint8_t cmsg_buf[CMSG_SPACE(sizeof(struct quic_tx_ancillary_data))];
+	struct quic_tx_ancillary_data *anc_data;
+	struct quic_connection_info conn_info;
+	int frag_size = 1200;
+	struct cmsghdr *cmsg_hdr;
+	socklen_t recv_addr_len;
+	struct iovec iov;
+	struct msghdr msg;
+	char *buf;
+
+	buf = (char *)malloc(9000);
+	conn_info.cipher_type = variant->algo;
+	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_info.key.dst_conn_id_length = variant->conn_id_len;
+	memcpy(conn_info.key.dst_conn_id,
+	       &variant->conn_id,
+	       variant->conn_id_len);
+	conn_info.conn_payload_key_gen = 0;
+
+	if (self->client.addr.sin_family == AF_INET) {
+		memcpy(&conn_info.key.addr.ipv4_addr,
+		       &self->client.addr.sin_addr, sizeof(struct in_addr));
+		conn_info.key.udp_port = self->client.addr.sin_port;
+	} else {
+		memcpy(&conn_info.key.addr.ipv6_addr,
+		       &self->client.addr6.sin6_addr,
+		       sizeof(struct in6_addr));
+		conn_info.key.udp_port = self->client.addr6.sin6_port;
+	}
+	ASSERT_TRUE(variant->algo == TLS_CIPHER_AES_GCM_128 ||
+		    variant->algo == TLS_CIPHER_CHACHA20_POLY1305);
+	switch (variant->algo) {
+	case TLS_CIPHER_AES_GCM_128:
+		memcpy(&conn_info.aes_gcm_128.payload_key,
+		       &variant->conn_key, 16);
+		memcpy(&conn_info.aes_gcm_128.payload_iv,
+		       &variant->conn_iv, 12);
+		memcpy(&conn_info.aes_gcm_128.header_key,
+		       &variant->conn_hdr_key, 16);
+		break;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		memcpy(&conn_info.chacha20_poly1305.payload_key,
+		       &variant->conn_key, 32);
+		memcpy(&conn_info.chacha20_poly1305.payload_iv,
+		       &variant->conn_iv, 12);
+		memcpy(&conn_info.chacha20_poly1305.header_key,
+		       &variant->conn_hdr_key, 32);
+		break;
+	}
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
+			     sizeof(frag_size)), 0);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION,
+			     &conn_info, sizeof(conn_info)), 0);
+
+	recv_addr_len = self->len_c;
+	iov.iov_base = (void *)variant->plain;
+	iov.iov_len = variant->plain_len;
+	memset(cmsg_buf, 0, sizeof(cmsg_buf));
+	msg.msg_name = (self->client.addr.sin_family == AF_INET)
+		       ? (void *)&self->client.addr
+		       : (void *)&self->client.addr6;
+	msg.msg_namelen = self->len_c;
+	msg.msg_iov = &iov;
+	msg.msg_iovlen = 1;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(sizeof(struct quic_tx_ancillary_data));
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->flags = 0;
+	anc_data->next_pkt_num = variant->next_pkt_num;
+	anc_data->dst_conn_id_length = variant->conn_id_len;
+
+	EXPECT_EQ(sendmsg(self->sfd, &msg, 0), variant->plain_len);
+	ASSERT_NE(setns(self->client_net_ns_fd, 0), -1);
+	if (variant->af_client == AF_INET) {
+		EXPECT_EQ(recvfrom(self->cfd, buf, 9000, 0,
+				   &self->client.addr, &recv_addr_len),
+			  variant->match_len);
+	} else {
+		EXPECT_EQ(recvfrom(self->cfd, buf, 9000, 0,
+				   &self->client.addr6, &recv_addr_len),
+			  variant->match_len);
+	}
+	EXPECT_STREQ(buf, variant->match);
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION,
+			     &conn_info, sizeof(conn_info)), 0);
+	free(buf);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/net/quic.sh b/tools/testing/selftests/net/quic.sh
new file mode 100755
index 000000000000..8ff8bc494671
--- /dev/null
+++ b/tools/testing/selftests/net/quic.sh
@@ -0,0 +1,46 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+sudo ip netns add ns11
+sudo ip netns add ns12
+sudo ip netns add ns2
+sudo ip link add veth11 type veth peer name br-veth11
+sudo ip link add veth12 type veth peer name br-veth12
+sudo ip link add veth2 type veth peer name br-veth2
+sudo ip link set veth11 netns ns11
+sudo ip link set veth12 netns ns12
+sudo ip link set veth2 netns ns2
+sudo ip netns exec ns11 ip addr add 10.0.0.1/24 dev veth11
+sudo ip netns exec ns11 ip addr add ::ffff:10.0.0.1/96 dev veth11
+sudo ip netns exec ns11 ip addr add 2001::1/64 dev veth11
+sudo ip netns exec ns12 ip addr add 10.0.0.3/24 dev veth12
+sudo ip netns exec ns12 ip addr add ::ffff:10.0.0.3/96 dev veth12
+sudo ip netns exec ns12 ip addr add 2001::3/64 dev veth12
+sudo ip netns exec ns2 ip addr add 10.0.0.2/24 dev veth2
+sudo ip netns exec ns2 ip addr add ::ffff:10.0.0.2/96 dev veth2
+sudo ip netns exec ns2 ip addr add 2001::2/64 dev veth2
+sudo ip link add name br1 type bridge forward_delay 0
+sudo ip link set br1 up
+sudo ip link set br-veth11 up
+sudo ip link set br-veth12 up
+sudo ip link set br-veth2 up
+sudo ip netns exec ns11 ip link set veth11 up
+sudo ip netns exec ns12 ip link set veth12 up
+sudo ip netns exec ns2 ip link set veth2 up
+sudo ip link set br-veth11 master br1
+sudo ip link set br-veth12 master br1
+sudo ip link set br-veth2 master br1
+sudo ip netns exec ns2 cat /proc/net/quic_stat
+
+printf "%s" "Waiting for bridge to start fowarding ..."
+while ! timeout 0.5 sudo ip netns exec ns2 ping -c 1 -n 2001::1 &> /dev/null
+do
+	printf "%c" "."
+done
+printf "\n%s\n"  "Bridge is operational"
+
+sudo ./quic
+sudo ip netns exec ns2 cat /proc/net/quic_stat
+sudo ip netns delete ns2
+sudo ip netns delete ns12
+sudo ip netns delete ns11
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [net-next v3 1/6] net: Documentation on QUIC kernel Tx crypto.
  2022-09-07  0:49   ` [net-next v3 1/6] net: Documentation on QUIC kernel Tx crypto Adel Abouchaev
@ 2022-09-07  3:38     ` Bagas Sanjaya
  2022-09-07 17:29       ` Adel Abouchaev
  0 siblings, 1 reply; 77+ messages in thread
From: Bagas Sanjaya @ 2022-09-07  3:38 UTC (permalink / raw)
  To: Adel Abouchaev
  Cc: davem, edumazet, kuba, pabeni, corbet, dsahern, shuah, netdev,
	linux-doc, linux-kselftest, kernel test robot

[-- Attachment #1: Type: text/plain, Size: 16345 bytes --]

On Tue, Sep 06, 2022 at 05:49:30PM -0700, Adel Abouchaev wrote:
> +===========
> +KERNEL QUIC
> +===========
> +
> +Overview
> +========
> +
> +QUIC is a secure general-purpose transport protocol that creates a stateful
> +interaction between a client and a server. QUIC provides end-to-end integrity
> +and confidentiality. Refer to RFC 9000 for more information on QUIC.
> +
> +The kernel Tx side offload covers the encryption of the application streams
> +in the kernel rather than in the application. These packets are 1RTT packets
> +in QUIC connection. Encryption of every other packets is still done by the
> +QUIC library in user space.
> +
> +The flow match is performed using 5 parameters: source and destination IP
> +addresses, source and destination UDP ports and destination QUIC connection ID.
> +Not all 5 parameters are always needed. The Tx direction matches the flow on
> +the destination IP, port and destination connection ID, while the Rx part would
> +later match on source IP, port and destination connection ID. This will cover
> +multiple scenarios where the server is using SO_REUSEADDR and/or empty
> +destination connection IDs or combination of these.
> +

Both Tx and Rx direction match destination connection ID. Is it right?

> +The Rx direction is not implemented in this set of patches.
> +
> +The connection migration scenario is not handled by the kernel code and will
> +be handled by the user space portion of QUIC library. On the Tx direction,
> +the new key would be installed before a packet with an updated destination is
> +sent. On the Rx direction, the behavior will be to drop a packet if a flow is
> +missing.
> +
> +For the key rotation, the behavior is to drop packets on Tx when the encryption
> +key with matching key rotation bit is not present. On Rx direction, the packet
> +will be sent to the userspace library with unencrypted header and encrypted
> +payload. A separate indication will be added to the ancillary data to indicate
> +the status of the operation as not matching the current key bit. It is not
> +possible to use the key rotation bit as part of the key for flow lookup as that
> +bit is protected by the header protection. A special provision will need to be
> +done in user mode to still attempt the decryption of the payload to prevent a
> +timing attack.
> +
> +
> +User Interface
> +==============
> +
> +Creating a QUIC connection
> +--------------------------
> +
> +QUIC connection originates and terminates in the application, using one of many
> +available QUIC libraries. The code instantiates QUIC client and QUIC server in
> +some form and configures them to use certain addresses and ports for the
> +source and destination. The client and server negotiate the set of keys to
> +protect the communication during different phases of the connection, maintain
> +the connection and perform congestion control.
> +
> +Requesting to add QUIC Tx kernel encryption to the connection
> +-------------------------------------------------------------
> +
> +Each flow that should be encrypted by the kernel needs to be registered with
> +the kernel using socket API. A setsockopt() call on the socket creates an
> +association between the QUIC connection ID of the flow with the encryption
> +parameters for the crypto operations:
> +
> +.. code-block:: c
> +
> +	struct quic_connection_info conn_info;
> +	char conn_id[5] = {0x01, 0x02, 0x03, 0x04, 0x05};
> +	const size_t conn_id_len = sizeof(conn_id);
> +	char conn_key[16] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
> +			     0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f};
> +	char conn_iv[12] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
> +			    0x08, 0x09, 0x0a, 0x0b};
> +	char conn_hdr_key[16] = {0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
> +				 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f
> +				};
> +
> +        conn_info.conn_payload_key_gen = 0;
> +	conn_info.cipher_type = TLS_CIPHER_AES_GCM_128;
> +
> +	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
> +	conn_info.key.conn_id_length = 5;
> +	memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE
> +				      - conn_id_len],
> +	       &conn_id, conn_id_len);
> +
> +	memcpy(&conn_info.payload_key, conn_key, sizeof(conn_key));
> +	memcpy(&conn_info.payload_iv, conn_iv, sizeof(conn_iv));
> +	memcpy(&conn_info.header_key, conn_hdr_key, sizeof(conn_hdr_key));
> +
> +	setsockopt(fd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION, &conn_info,
> +		   sizeof(conn_info));
> +
> +
> +Requesting to remove QUIC Tx kernel crypto offload control messages
> +-------------------------------------------------------------------
> +
> +All flows are removed when the socket is closed. To request an explicit remove
> +of the offload for the connection during the lifetime of the socket the process
> +is similar to adding the flow. Only the connection ID and its length are
> +necessary to supply to remove the connection from the offload:
> +
> +.. code-block:: c
> +
> +	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
> +	conn_info.key.conn_id_length = 5;
> +	memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE
> +				      - conn_id_len],
> +	       &conn_id, conn_id_len);
> +	setsockopt(fd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION, &conn_info,
> +		   sizeof(conn_info));
> +
> +Sending QUIC application data
> +-----------------------------
> +
> +For QUIC Tx encryption offload, the application should use sendmsg() socket
> +call and provide ancillary data with information on connection ID length and
> +offload flags for the kernel to perform the encryption and GSO support if
> +requested.
> +
> +.. code-block:: c
> +
> +	size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
> +	uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)];
> +	struct quic_tx_ancillary_data * anc_data;
> +	size_t quic_data_len = 4500;
> +	struct cmsghdr * cmsg_hdr;
> +	char quic_data[9000];
> +	struct iovec iov[2];
> +	int send_len = 9000;
> +	struct msghdr msg;
> +	int err;
> +
> +	iov[0].iov_base = quic_data;
> +	iov[0].iov_len = quic_data_len;
> +	iov[1].iov_base = quic_data + 4500;
> +	iov[1].iov_len = quic_data_len;
> +
> +	if (client.addr.sin_family == AF_INET) {
> +		msg.msg_name = &client.addr;
> +		msg.msg_namelen = sizeof(client.addr);
> +	} else {
> +		msg.msg_name = &client.addr6;
> +		msg.msg_namelen = sizeof(client.addr6);
> +	}
> +
> +	msg.msg_iov = iov;
> +	msg.msg_iovlen = 2;
> +	msg.msg_control = cmsg_buf;
> +	msg.msg_controllen = sizeof(cmsg_buf);
> +	cmsg_hdr = CMSG_FIRSTHDR(&msg);
> +	cmsg_hdr->cmsg_level = IPPROTO_UDP;
> +	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
> +	cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
> +	anc_data = CMSG_DATA(cmsg_hdr);
> +	anc_data->flags = 0;
> +	anc_data->next_pkt_num = 0x0d65c9;
> +	anc_data->conn_id_length = conn_id_len;
> +	err = sendmsg(self->sfd, &msg, 0);
> +
> +QUIC Tx offload in kernel will read the data from userspace, encrypt and
> +copy it to the ciphertext within the same operation.
> +
> +
> +Sending QUIC application data with GSO
> +--------------------------------------
> +When GSO is in use, the kernel will use the GSO fragment size as the target
> +for ciphertext. The packets from the user space should align on the boundary
> +of GSO fragment size minus the size of the tag for the chosen cipher. For the
> +GSO fragment 1200, the plain packets should follow each other at every 1184
> +bytes, given the tag size of 16. After the encryption, the rest of the UDP
> +and IP stacks will follow the defined value of GSO fragment which will include
> +the trailing tag bytes.
> +
> +To set up GSO fragmentation:
> +
> +.. code-block:: c
> +
> +	setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
> +		   sizeof(frag_size));
> +
> +If the GSO fragment size is provided in ancillary data within the sendmsg()
> +call, the value in ancillary data will take precedence over the segment size
> +provided in setsockopt to split the payload into packets. This is consistent
> +with the UDP stack behavior.
> +
> +Integrating to userspace QUIC libraries
> +---------------------------------------
> +
> +Userspace QUIC libraries integration would depend on the implementation of the
> +QUIC protocol. For MVFST library, the control plane is integrated into the
> +handshake callbacks to properly configure the flows into the socket; and the
> +data plane is integrated into the methods that perform encryption and send
> +the packets to the batch scheduler for transmissions to the socket.
> +
> +MVFST library can be found at https://github.com/facebookincubator/mvfst.
> +
> +Statistics
> +==========
> +
> +QUIC Tx offload to the kernel has counters
> +(``/proc/net/quic_stat``):
> +
> +- ``QuicCurrTxSw`` -
> +  number of currently active kernel offloaded QUIC connections
> +- ``QuicTxSw`` -
> +  accumulative total number of offloaded QUIC connections
> +- ``QuicTxSwError`` -
> +  accumulative total number of errors during QUIC Tx offload to kernel

The rest of documentation can be improved, like:

---- >8 ----

diff --git a/Documentation/networking/quic.rst b/Documentation/networking/quic.rst
index 2e6ec72f4eea3a..3f3d05b901da3f 100644
--- a/Documentation/networking/quic.rst
+++ b/Documentation/networking/quic.rst
@@ -9,22 +9,22 @@ Overview
 
 QUIC is a secure general-purpose transport protocol that creates a stateful
 interaction between a client and a server. QUIC provides end-to-end integrity
-and confidentiality. Refer to RFC 9000 for more information on QUIC.
+and confidentiality. Refer to RFC 9000 [#rfc9000]_ for the standard document.
 
 The kernel Tx side offload covers the encryption of the application streams
 in the kernel rather than in the application. These packets are 1RTT packets
 in QUIC connection. Encryption of every other packets is still done by the
-QUIC library in user space.
+QUIC library in userspace.
 
 The flow match is performed using 5 parameters: source and destination IP
 addresses, source and destination UDP ports and destination QUIC connection ID.
-Not all 5 parameters are always needed. The Tx direction matches the flow on
-the destination IP, port and destination connection ID, while the Rx part would
-later match on source IP, port and destination connection ID. This will cover
-multiple scenarios where the server is using SO_REUSEADDR and/or empty
-destination connection IDs or combination of these.
+Not all these parameters are always needed. The Tx direction matches the flow
+on the destination IP, port and destination connection ID; while the Rx
+direction would later match on source IP, port and destination connection ID.
+This will cover multiple scenarios where the server is using ``SO_REUSEADDR``
+and/or empty destination connection IDs or combination of these.
 
-The Rx direction is not implemented in this set of patches.
+The Rx direction is not implemented yet.
 
 The connection migration scenario is not handled by the kernel code and will
 be handled by the user space portion of QUIC library. On the Tx direction,
@@ -39,8 +39,8 @@ payload. A separate indication will be added to the ancillary data to indicate
 the status of the operation as not matching the current key bit. It is not
 possible to use the key rotation bit as part of the key for flow lookup as that
 bit is protected by the header protection. A special provision will need to be
-done in user mode to still attempt the decryption of the payload to prevent a
-timing attack.
+done in user mode to keep attempting the payload decription to prevent timing
+attacks.
 
 
 User Interface
@@ -50,7 +50,7 @@ Creating a QUIC connection
 --------------------------
 
 QUIC connection originates and terminates in the application, using one of many
-available QUIC libraries. The code instantiates QUIC client and QUIC server in
+available QUIC libraries. The code instantiates the client and server in
 some form and configures them to use certain addresses and ports for the
 source and destination. The client and server negotiate the set of keys to
 protect the communication during different phases of the connection, maintain
@@ -60,7 +60,7 @@ Requesting to add QUIC Tx kernel encryption to the connection
 -------------------------------------------------------------
 
 Each flow that should be encrypted by the kernel needs to be registered with
-the kernel using socket API. A setsockopt() call on the socket creates an
+the kernel using socket API. A ``setsockopt()`` call on the socket creates an
 association between the QUIC connection ID of the flow with the encryption
 parameters for the crypto operations:
 
@@ -112,10 +112,10 @@ necessary to supply to remove the connection from the offload:
 	setsockopt(fd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION, &conn_info,
 		   sizeof(conn_info));
 
-Sending QUIC application data
------------------------------
+Sending application data
+------------------------
 
-For QUIC Tx encryption offload, the application should use sendmsg() socket
+For Tx encryption offload, the application should use ``sendmsg()`` socket
 call and provide ancillary data with information on connection ID length and
 offload flags for the kernel to perform the encryption and GSO support if
 requested.
@@ -168,11 +168,11 @@ Sending QUIC application data with GSO
 --------------------------------------
 When GSO is in use, the kernel will use the GSO fragment size as the target
 for ciphertext. The packets from the user space should align on the boundary
-of GSO fragment size minus the size of the tag for the chosen cipher. For the
-GSO fragment 1200, the plain packets should follow each other at every 1184
-bytes, given the tag size of 16. After the encryption, the rest of the UDP
-and IP stacks will follow the defined value of GSO fragment which will include
-the trailing tag bytes.
+of the fragment size minus the tag size for the chosen cipher. For example,
+if the fragment size is 1200 bytes and the tag size is 16 bytes, the plain
+packets should follow each other at every 1184 bytes. After the encryption,
+the rest of UDP and IP stacks will follow the defined value of the fragment,
+which includes the trailing tag bytes.
 
 To set up GSO fragmentation:
 
@@ -181,7 +181,7 @@ To set up GSO fragmentation:
 	setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
 		   sizeof(frag_size));
 
-If the GSO fragment size is provided in ancillary data within the sendmsg()
+If the fragment size is provided in ancillary data within the ``sendmsg()``
 call, the value in ancillary data will take precedence over the segment size
 provided in setsockopt to split the payload into packets. This is consistent
 with the UDP stack behavior.
@@ -190,12 +190,10 @@ Integrating to userspace QUIC libraries
 ---------------------------------------
 
 Userspace QUIC libraries integration would depend on the implementation of the
-QUIC protocol. For MVFST library, the control plane is integrated into the
-handshake callbacks to properly configure the flows into the socket; and the
-data plane is integrated into the methods that perform encryption and send
-the packets to the batch scheduler for transmissions to the socket.
-
-MVFST library can be found at https://github.com/facebookincubator/mvfst.
+QUIC protocol. For MVFST library [#mvfst]_, the control plane is integrated
+into the handshake callbacks to properly configure the flows into the socket;
+and the data plane is integrated into the methods that perform encryption
+and send the packets to the batch scheduler for transmissions to the socket.
 
 Statistics
 ==========
@@ -209,3 +207,9 @@ QUIC Tx offload to the kernel has counters
   accumulative total number of offloaded QUIC connections
 - ``QuicTxSwError`` -
   accumulative total number of errors during QUIC Tx offload to kernel
+
+References
+==========
+
+.. [#rfc9000] https://datatracker.ietf.org/doc/html/rfc9000
+.. [#mvfst] https://github.com/facebookincubator/mvfst

Thanks.

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [net-next v3 1/6] net: Documentation on QUIC kernel Tx crypto.
  2022-09-07  3:38     ` Bagas Sanjaya
@ 2022-09-07 17:29       ` Adel Abouchaev
  0 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-09-07 17:29 UTC (permalink / raw)
  To: Bagas Sanjaya
  Cc: davem, edumazet, kuba, pabeni, corbet, dsahern, shuah, netdev,
	linux-doc, linux-kselftest, kernel test robot


On 9/6/22 8:38 PM, Bagas Sanjaya wrote:
> On Tue, Sep 06, 2022 at 05:49:30PM -0700, Adel Abouchaev wrote:
>> +===========
>> +KERNEL QUIC
>> +===========
>> +
>> +Overview
>> +========
>> +
>> +QUIC is a secure general-purpose transport protocol that creates a stateful
>> +interaction between a client and a server. QUIC provides end-to-end integrity
>> +and confidentiality. Refer to RFC 9000 for more information on QUIC.
>> +
>> +The kernel Tx side offload covers the encryption of the application streams
>> +in the kernel rather than in the application. These packets are 1RTT packets
>> +in QUIC connection. Encryption of every other packets is still done by the
>> +QUIC library in user space.
>> +
>> +The flow match is performed using 5 parameters: source and destination IP
>> +addresses, source and destination UDP ports and destination QUIC connection ID.
>> +Not all 5 parameters are always needed. The Tx direction matches the flow on
>> +the destination IP, port and destination connection ID, while the Rx part would
>> +later match on source IP, port and destination connection ID. This will cover
>> +multiple scenarios where the server is using SO_REUSEADDR and/or empty
>> +destination connection IDs or combination of these.
>> +
> Both Tx and Rx direction match destination connection ID. Is it right?

That is correct. The QUIC packet only carries the destination CID in its 
header.

Although the Tx direction could have an ancillary data carrying the 
source CID,

it is not required by any viable use case scenario.

Thank you for looking at the doc, I will add the documentation changes 
into the

v4 update.

>
>> +The Rx direction is not implemented in this set of patches.
>> +
>> +The connection migration scenario is not handled by the kernel code and will
>> +be handled by the user space portion of QUIC library. On the Tx direction,
>> +the new key would be installed before a packet with an updated destination is
>> +sent. On the Rx direction, the behavior will be to drop a packet if a flow is
>> +missing.
>> +
>> +For the key rotation, the behavior is to drop packets on Tx when the encryption
>> +key with matching key rotation bit is not present. On Rx direction, the packet
>> +will be sent to the userspace library with unencrypted header and encrypted
>> +payload. A separate indication will be added to the ancillary data to indicate
>> +the status of the operation as not matching the current key bit. It is not
>> +possible to use the key rotation bit as part of the key for flow lookup as that
>> +bit is protected by the header protection. A special provision will need to be
>> +done in user mode to still attempt the decryption of the payload to prevent a
>> +timing attack.
>> +
>> +
>> +User Interface
>> +==============
>> +
>> +Creating a QUIC connection
>> +--------------------------
>> +
>> +QUIC connection originates and terminates in the application, using one of many
>> +available QUIC libraries. The code instantiates QUIC client and QUIC server in
>> +some form and configures them to use certain addresses and ports for the
>> +source and destination. The client and server negotiate the set of keys to
>> +protect the communication during different phases of the connection, maintain
>> +the connection and perform congestion control.
>> +
>> +Requesting to add QUIC Tx kernel encryption to the connection
>> +-------------------------------------------------------------
>> +
>> +Each flow that should be encrypted by the kernel needs to be registered with
>> +the kernel using socket API. A setsockopt() call on the socket creates an
>> +association between the QUIC connection ID of the flow with the encryption
>> +parameters for the crypto operations:
>> +
>> +.. code-block:: c
>> +
>> +	struct quic_connection_info conn_info;
>> +	char conn_id[5] = {0x01, 0x02, 0x03, 0x04, 0x05};
>> +	const size_t conn_id_len = sizeof(conn_id);
>> +	char conn_key[16] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
>> +			     0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f};
>> +	char conn_iv[12] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
>> +			    0x08, 0x09, 0x0a, 0x0b};
>> +	char conn_hdr_key[16] = {0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
>> +				 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f
>> +				};
>> +
>> +        conn_info.conn_payload_key_gen = 0;
>> +	conn_info.cipher_type = TLS_CIPHER_AES_GCM_128;
>> +
>> +	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
>> +	conn_info.key.conn_id_length = 5;
>> +	memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE
>> +				      - conn_id_len],
>> +	       &conn_id, conn_id_len);
>> +
>> +	memcpy(&conn_info.payload_key, conn_key, sizeof(conn_key));
>> +	memcpy(&conn_info.payload_iv, conn_iv, sizeof(conn_iv));
>> +	memcpy(&conn_info.header_key, conn_hdr_key, sizeof(conn_hdr_key));
>> +
>> +	setsockopt(fd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION, &conn_info,
>> +		   sizeof(conn_info));
>> +
>> +
>> +Requesting to remove QUIC Tx kernel crypto offload control messages
>> +-------------------------------------------------------------------
>> +
>> +All flows are removed when the socket is closed. To request an explicit remove
>> +of the offload for the connection during the lifetime of the socket the process
>> +is similar to adding the flow. Only the connection ID and its length are
>> +necessary to supply to remove the connection from the offload:
>> +
>> +.. code-block:: c
>> +
>> +	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
>> +	conn_info.key.conn_id_length = 5;
>> +	memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE
>> +				      - conn_id_len],
>> +	       &conn_id, conn_id_len);
>> +	setsockopt(fd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION, &conn_info,
>> +		   sizeof(conn_info));
>> +
>> +Sending QUIC application data
>> +-----------------------------
>> +
>> +For QUIC Tx encryption offload, the application should use sendmsg() socket
>> +call and provide ancillary data with information on connection ID length and
>> +offload flags for the kernel to perform the encryption and GSO support if
>> +requested.
>> +
>> +.. code-block:: c
>> +
>> +	size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
>> +	uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)];
>> +	struct quic_tx_ancillary_data * anc_data;
>> +	size_t quic_data_len = 4500;
>> +	struct cmsghdr * cmsg_hdr;
>> +	char quic_data[9000];
>> +	struct iovec iov[2];
>> +	int send_len = 9000;
>> +	struct msghdr msg;
>> +	int err;
>> +
>> +	iov[0].iov_base = quic_data;
>> +	iov[0].iov_len = quic_data_len;
>> +	iov[1].iov_base = quic_data + 4500;
>> +	iov[1].iov_len = quic_data_len;
>> +
>> +	if (client.addr.sin_family == AF_INET) {
>> +		msg.msg_name = &client.addr;
>> +		msg.msg_namelen = sizeof(client.addr);
>> +	} else {
>> +		msg.msg_name = &client.addr6;
>> +		msg.msg_namelen = sizeof(client.addr6);
>> +	}
>> +
>> +	msg.msg_iov = iov;
>> +	msg.msg_iovlen = 2;
>> +	msg.msg_control = cmsg_buf;
>> +	msg.msg_controllen = sizeof(cmsg_buf);
>> +	cmsg_hdr = CMSG_FIRSTHDR(&msg);
>> +	cmsg_hdr->cmsg_level = IPPROTO_UDP;
>> +	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
>> +	cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
>> +	anc_data = CMSG_DATA(cmsg_hdr);
>> +	anc_data->flags = 0;
>> +	anc_data->next_pkt_num = 0x0d65c9;
>> +	anc_data->conn_id_length = conn_id_len;
>> +	err = sendmsg(self->sfd, &msg, 0);
>> +
>> +QUIC Tx offload in kernel will read the data from userspace, encrypt and
>> +copy it to the ciphertext within the same operation.
>> +
>> +
>> +Sending QUIC application data with GSO
>> +--------------------------------------
>> +When GSO is in use, the kernel will use the GSO fragment size as the target
>> +for ciphertext. The packets from the user space should align on the boundary
>> +of GSO fragment size minus the size of the tag for the chosen cipher. For the
>> +GSO fragment 1200, the plain packets should follow each other at every 1184
>> +bytes, given the tag size of 16. After the encryption, the rest of the UDP
>> +and IP stacks will follow the defined value of GSO fragment which will include
>> +the trailing tag bytes.
>> +
>> +To set up GSO fragmentation:
>> +
>> +.. code-block:: c
>> +
>> +	setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
>> +		   sizeof(frag_size));
>> +
>> +If the GSO fragment size is provided in ancillary data within the sendmsg()
>> +call, the value in ancillary data will take precedence over the segment size
>> +provided in setsockopt to split the payload into packets. This is consistent
>> +with the UDP stack behavior.
>> +
>> +Integrating to userspace QUIC libraries
>> +---------------------------------------
>> +
>> +Userspace QUIC libraries integration would depend on the implementation of the
>> +QUIC protocol. For MVFST library, the control plane is integrated into the
>> +handshake callbacks to properly configure the flows into the socket; and the
>> +data plane is integrated into the methods that perform encryption and send
>> +the packets to the batch scheduler for transmissions to the socket.
>> +
>> +MVFST library can be found at https://github.com/facebookincubator/mvfst.
>> +
>> +Statistics
>> +==========
>> +
>> +QUIC Tx offload to the kernel has counters
>> +(``/proc/net/quic_stat``):
>> +
>> +- ``QuicCurrTxSw`` -
>> +  number of currently active kernel offloaded QUIC connections
>> +- ``QuicTxSw`` -
>> +  accumulative total number of offloaded QUIC connections
>> +- ``QuicTxSwError`` -
>> +  accumulative total number of errors during QUIC Tx offload to kernel
> The rest of documentation can be improved, like:
>
> ---- >8 ----
>
> diff --git a/Documentation/networking/quic.rst b/Documentation/networking/quic.rst
> index 2e6ec72f4eea3a..3f3d05b901da3f 100644
> --- a/Documentation/networking/quic.rst
> +++ b/Documentation/networking/quic.rst
> @@ -9,22 +9,22 @@ Overview
>   
>   QUIC is a secure general-purpose transport protocol that creates a stateful
>   interaction between a client and a server. QUIC provides end-to-end integrity
> -and confidentiality. Refer to RFC 9000 for more information on QUIC.
> +and confidentiality. Refer to RFC 9000 [#rfc9000]_ for the standard document.
>   
>   The kernel Tx side offload covers the encryption of the application streams
>   in the kernel rather than in the application. These packets are 1RTT packets
>   in QUIC connection. Encryption of every other packets is still done by the
> -QUIC library in user space.
> +QUIC library in userspace.
>   
>   The flow match is performed using 5 parameters: source and destination IP
>   addresses, source and destination UDP ports and destination QUIC connection ID.
> -Not all 5 parameters are always needed. The Tx direction matches the flow on
> -the destination IP, port and destination connection ID, while the Rx part would
> -later match on source IP, port and destination connection ID. This will cover
> -multiple scenarios where the server is using SO_REUSEADDR and/or empty
> -destination connection IDs or combination of these.
> +Not all these parameters are always needed. The Tx direction matches the flow
> +on the destination IP, port and destination connection ID; while the Rx
> +direction would later match on source IP, port and destination connection ID.
> +This will cover multiple scenarios where the server is using ``SO_REUSEADDR``
> +and/or empty destination connection IDs or combination of these.
>   
> -The Rx direction is not implemented in this set of patches.
> +The Rx direction is not implemented yet.
>   
>   The connection migration scenario is not handled by the kernel code and will
>   be handled by the user space portion of QUIC library. On the Tx direction,
> @@ -39,8 +39,8 @@ payload. A separate indication will be added to the ancillary data to indicate
>   the status of the operation as not matching the current key bit. It is not
>   possible to use the key rotation bit as part of the key for flow lookup as that
>   bit is protected by the header protection. A special provision will need to be
> -done in user mode to still attempt the decryption of the payload to prevent a
> -timing attack.
> +done in user mode to keep attempting the payload decription to prevent timing
> +attacks.
>   
>   
>   User Interface
> @@ -50,7 +50,7 @@ Creating a QUIC connection
>   --------------------------
>   
>   QUIC connection originates and terminates in the application, using one of many
> -available QUIC libraries. The code instantiates QUIC client and QUIC server in
> +available QUIC libraries. The code instantiates the client and server in
>   some form and configures them to use certain addresses and ports for the
>   source and destination. The client and server negotiate the set of keys to
>   protect the communication during different phases of the connection, maintain
> @@ -60,7 +60,7 @@ Requesting to add QUIC Tx kernel encryption to the connection
>   -------------------------------------------------------------
>   
>   Each flow that should be encrypted by the kernel needs to be registered with
> -the kernel using socket API. A setsockopt() call on the socket creates an
> +the kernel using socket API. A ``setsockopt()`` call on the socket creates an
>   association between the QUIC connection ID of the flow with the encryption
>   parameters for the crypto operations:
>   
> @@ -112,10 +112,10 @@ necessary to supply to remove the connection from the offload:
>   	setsockopt(fd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION, &conn_info,
>   		   sizeof(conn_info));
>   
> -Sending QUIC application data
> ------------------------------
> +Sending application data
> +------------------------
>   
> -For QUIC Tx encryption offload, the application should use sendmsg() socket
> +For Tx encryption offload, the application should use ``sendmsg()`` socket
>   call and provide ancillary data with information on connection ID length and
>   offload flags for the kernel to perform the encryption and GSO support if
>   requested.
> @@ -168,11 +168,11 @@ Sending QUIC application data with GSO
>   --------------------------------------
>   When GSO is in use, the kernel will use the GSO fragment size as the target
>   for ciphertext. The packets from the user space should align on the boundary
> -of GSO fragment size minus the size of the tag for the chosen cipher. For the
> -GSO fragment 1200, the plain packets should follow each other at every 1184
> -bytes, given the tag size of 16. After the encryption, the rest of the UDP
> -and IP stacks will follow the defined value of GSO fragment which will include
> -the trailing tag bytes.
> +of the fragment size minus the tag size for the chosen cipher. For example,
> +if the fragment size is 1200 bytes and the tag size is 16 bytes, the plain
> +packets should follow each other at every 1184 bytes. After the encryption,
> +the rest of UDP and IP stacks will follow the defined value of the fragment,
> +which includes the trailing tag bytes.
>   
>   To set up GSO fragmentation:
>   
> @@ -181,7 +181,7 @@ To set up GSO fragmentation:
>   	setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
>   		   sizeof(frag_size));
>   
> -If the GSO fragment size is provided in ancillary data within the sendmsg()
> +If the fragment size is provided in ancillary data within the ``sendmsg()``
>   call, the value in ancillary data will take precedence over the segment size
>   provided in setsockopt to split the payload into packets. This is consistent
>   with the UDP stack behavior.
> @@ -190,12 +190,10 @@ Integrating to userspace QUIC libraries
>   ---------------------------------------
>   
>   Userspace QUIC libraries integration would depend on the implementation of the
> -QUIC protocol. For MVFST library, the control plane is integrated into the
> -handshake callbacks to properly configure the flows into the socket; and the
> -data plane is integrated into the methods that perform encryption and send
> -the packets to the batch scheduler for transmissions to the socket.
> -
> -MVFST library can be found at https://github.com/facebookincubator/mvfst.
> +QUIC protocol. For MVFST library [#mvfst]_, the control plane is integrated
> +into the handshake callbacks to properly configure the flows into the socket;
> +and the data plane is integrated into the methods that perform encryption
> +and send the packets to the batch scheduler for transmissions to the socket.
>   
>   Statistics
>   ==========
> @@ -209,3 +207,9 @@ QUIC Tx offload to the kernel has counters
>     accumulative total number of offloaded QUIC connections
>   - ``QuicTxSwError`` -
>     accumulative total number of errors during QUIC Tx offload to kernel
> +
> +References
> +==========
> +
> +.. [#rfc9000] https://datatracker.ietf.org/doc/html/rfc9000
> +.. [#mvfst] https://github.com/facebookincubator/mvfst
>
> Thanks.
>
Cheers,

Adel.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [net-next v4 0/6] net: support QUIC crypto
       [not found] <Adel Abouchaev <adel.abushaev@gmail.com>
                   ` (6 preceding siblings ...)
  2022-09-07  0:49 ` [net-next v3 0/6] net: support QUIC crypto Adel Abouchaev
@ 2022-09-09  0:12 ` Adel Abouchaev
  2022-09-09  0:12   ` [net-next v4 1/6] net: Documentation on QUIC kernel Tx crypto Adel Abouchaev
                     ` (5 more replies)
  7 siblings, 6 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-09-09  0:12 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, corbet, dsahern, shuah, paul,
	jmorris, serge, linux-security-module, netdev, linux-doc,
	linux-kselftest

QUIC requires end to end encryption of the data. The application usually
prepares the data in clear text, encrypts and calls send() which implies
multiple copies of the data before the packets hit the networking stack.
Similar to kTLS, QUIC kernel offload of cryptography reduces the memory
pressure by reducing the number of copies.

The scope of kernel support is limited to the symmetric cryptography,
leaving the handshake to the user space library. For QUIC in particular,
the application packets that require symmetric cryptography are the 1RTT
packets with short headers. Kernel will encrypt the application packets
on transmission and decrypt on receive. This series implements Tx only,
because in QUIC server applications Tx outweighs Rx by orders of
magnitude.

Supporting the combination of QUIC and GSO requires the application to
correctly place the data and the kernel to correctly slice it. The
encryption process appends an arbitrary number of bytes (tag) to the end
of the message to authenticate it. The GSO value should include this
overhead, the offload would then subtract the tag size to parse the
input on Tx before chunking and encrypting it.

With the kernel cryptography, the buffer copy operation is conjoined
with the encryption operation. The memory bandwidth is reduced by 5-8%.
When devices supporting QUIC encryption in hardware come to the market,
we will be able to free further 7% of CPU utilization which is used
today for crypto operations.

Adel Abouchaev (6):
  Documentation on QUIC kernel Tx crypto.
  Define QUIC specific constants, control and data plane structures
  Add UDP ULP operations, initialization and handling prototype
    functions.
  Implement QUIC offload functions
  Add flow counters and Tx processing error counter
  Add self tests for ULP operations, flow setup and crypto tests

 Documentation/networking/index.rst     |    1 +
 Documentation/networking/quic.rst      |  215 ++++
 include/net/inet_sock.h                |    2 +
 include/net/netns/mib.h                |    3 +
 include/net/quic.h                     |   63 +
 include/net/snmp.h                     |    6 +
 include/net/udp.h                      |   33 +
 include/uapi/linux/quic.h              |   68 ++
 include/uapi/linux/snmp.h              |    9 +
 include/uapi/linux/udp.h               |    4 +
 net/Kconfig                            |    1 +
 net/Makefile                           |    1 +
 net/ipv4/Makefile                      |    3 +-
 net/ipv4/udp.c                         |   15 +
 net/ipv4/udp_ulp.c                     |  192 +++
 net/quic/Kconfig                       |   16 +
 net/quic/Makefile                      |    8 +
 net/quic/quic_main.c                   | 1533 ++++++++++++++++++++++++
 net/quic/quic_proc.c                   |   45 +
 security/security.c                    |    1 +
 tools/testing/selftests/net/.gitignore |    1 +
 tools/testing/selftests/net/Makefile   |    3 +-
 tools/testing/selftests/net/quic.c     | 1369 +++++++++++++++++++++
 tools/testing/selftests/net/quic.sh    |   46 +
 24 files changed, 3636 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/networking/quic.rst
 create mode 100644 include/net/quic.h
 create mode 100644 include/uapi/linux/quic.h
 create mode 100644 net/ipv4/udp_ulp.c
 create mode 100644 net/quic/Kconfig
 create mode 100644 net/quic/Makefile
 create mode 100644 net/quic/quic_main.c
 create mode 100644 net/quic/quic_proc.c
 create mode 100644 tools/testing/selftests/net/quic.c
 create mode 100755 tools/testing/selftests/net/quic.sh

-- 
2.30.2


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [net-next v4 1/6] net: Documentation on QUIC kernel Tx crypto.
  2022-09-09  0:12 ` [net-next v4 0/6] net: support QUIC crypto Adel Abouchaev
@ 2022-09-09  0:12   ` Adel Abouchaev
  2022-09-09  1:40     ` Bagas Sanjaya
  2022-09-09  0:12   ` [net-next v4 2/6] net: Define QUIC specific constants, control and data plane structures Adel Abouchaev
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 77+ messages in thread
From: Adel Abouchaev @ 2022-09-09  0:12 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, corbet, dsahern, shuah, paul,
	jmorris, serge, linux-security-module, netdev, linux-doc,
	linux-kselftest
  Cc: kernel test robot

Add documentation for kernel QUIC code.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>

---

Added quic.rst reference to the index.rst file; identation in
quic.rst file.
Reported-by: kernel test robot <lkp@intel.com>

Added SPDX license GPL 2.0.
v2: Removed whitespace at EOF.
v3: Added explanation of features.
v4: Updated and formatted the doc for readability.
---
 Documentation/networking/index.rst |   1 +
 Documentation/networking/quic.rst  | 215 +++++++++++++++++++++++++++++
 2 files changed, 216 insertions(+)
 create mode 100644 Documentation/networking/quic.rst

diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index bacadd09e570..0dacd8c8a3ff 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -89,6 +89,7 @@ Contents:
    plip
    ppp_generic
    proc_net_tcp
+   quic
    radiotap-headers
    rds
    regulatory
diff --git a/Documentation/networking/quic.rst b/Documentation/networking/quic.rst
new file mode 100644
index 000000000000..48861c458381
--- /dev/null
+++ b/Documentation/networking/quic.rst
@@ -0,0 +1,215 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========
+KERNEL QUIC
+===========
+
+Overview
+========
+
+QUIC is a secure general-purpose transport protocol that creates a stateful
+interaction between a client and a server. QUIC provides end-to-end integrity
+and confidentiality. Refer to RFC 9000 [#rfc9000]_ for the standard document.
+
+The kernel Tx side offload covers the encryption of the application streams
+in the kernel rather than in the application. These packets are 1RTT packets
+in QUIC connection. Encryption of every other packets is still done by the
+QUIC library in userspace.
+
+The flow match is performed using 5 parameters: source and destination IP
+addresses, source and destination UDP ports and destination QUIC connection ID.
+Not all these parameters are always needed. The Tx direction matches the flow
+on the destination IP, port and destination connection ID; while the Rx
+direction would later match on source IP, port and destination connection ID.
+This will cover multiple scenarios where the server is using ``SO_REUSEADDR``
+and/or empty destination connection IDs or combination of these.
+
+The Rx direction is not implemented yet.
+
+The connection migration scenario is not handled by the kernel code and will
+be handled by the user space portion of QUIC library. On the Tx direction,
+the new key would be installed before a packet with an updated destination is
+sent. On the Rx direction, the behavior will be to drop a packet if a flow is
+missing.
+
+For the key rotation, the behavior is to drop packets on Tx when the encryption
+key with matching key rotation bit is not present. On Rx direction, the packet
+will be sent to the userspace library with unencrypted header and encrypted
+payload. A separate indication will be added to the ancillary data to indicate
+the status of the operation as not matching the current key bit. It is not
+possible to use the key rotation bit as part of the key for flow lookup as that
+bit is protected by the header protection. A special provision will need to be
+done in user mode to keep attempting the payload decryption to prevent timing
+attacks.
+
+
+User Interface
+==============
+
+Creating a QUIC connection
+--------------------------
+
+QUIC connection originates and terminates in the application, using one of many
+available QUIC libraries. The code instantiates the client and server in
+some form and configures them to use certain addresses and ports for the
+source and destination. The client and server negotiate the set of keys to
+protect the communication during different phases of the connection, maintain
+the connection and perform congestion control.
+
+Requesting to add QUIC Tx kernel encryption to the connection
+-------------------------------------------------------------
+
+Each flow that should be encrypted by the kernel needs to be registered with
+the kernel using socket API. A ``setsockopt()`` call on the socket creates an
+association between the QUIC connection ID of the flow with the encryption
+parameters for the crypto operations:
+
+.. code-block:: c
+
+	struct quic_connection_info conn_info;
+	char conn_id[5] = {0x01, 0x02, 0x03, 0x04, 0x05};
+	const size_t conn_id_len = sizeof(conn_id);
+	char conn_key[16] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+			     0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f};
+	char conn_iv[12] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+			    0x08, 0x09, 0x0a, 0x0b};
+	char conn_hdr_key[16] = {0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
+				 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f
+				};
+
+        conn_info.conn_payload_key_gen = 0;
+	conn_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+
+	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_info.key.conn_id_length = 5;
+	memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE
+				      - conn_id_len],
+	       &conn_id, conn_id_len);
+
+	memcpy(&conn_info.payload_key, conn_key, sizeof(conn_key));
+	memcpy(&conn_info.payload_iv, conn_iv, sizeof(conn_iv));
+	memcpy(&conn_info.header_key, conn_hdr_key, sizeof(conn_hdr_key));
+
+	setsockopt(fd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION, &conn_info,
+		   sizeof(conn_info));
+
+
+Requesting to remove QUIC Tx kernel crypto offload control messages
+-------------------------------------------------------------------
+
+All flows are removed when the socket is closed. To request an explicit remove
+of the offload for the connection during the lifetime of the socket the process
+is similar to adding the flow. Only the connection ID and its length are
+necessary to supply to remove the connection from the offload:
+
+.. code-block:: c
+
+	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_info.key.conn_id_length = 5;
+	memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE
+				      - conn_id_len],
+	       &conn_id, conn_id_len);
+	setsockopt(fd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION, &conn_info,
+		   sizeof(conn_info));
+
+Sending application data
+------------------------
+
+For Tx encryption offload, the application should use ``sendmsg()`` socket
+call and provide ancillary data with information on connection ID length and
+offload flags for the kernel to perform the encryption and GSO support if
+requested.
+
+.. code-block:: c
+
+	size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
+	uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)];
+	struct quic_tx_ancillary_data * anc_data;
+	size_t quic_data_len = 4500;
+	struct cmsghdr * cmsg_hdr;
+	char quic_data[9000];
+	struct iovec iov[2];
+	int send_len = 9000;
+	struct msghdr msg;
+	int err;
+
+	iov[0].iov_base = quic_data;
+	iov[0].iov_len = quic_data_len;
+	iov[1].iov_base = quic_data + 4500;
+	iov[1].iov_len = quic_data_len;
+
+	if (client.addr.sin_family == AF_INET) {
+		msg.msg_name = &client.addr;
+		msg.msg_namelen = sizeof(client.addr);
+	} else {
+		msg.msg_name = &client.addr6;
+		msg.msg_namelen = sizeof(client.addr6);
+	}
+
+	msg.msg_iov = iov;
+	msg.msg_iovlen = 2;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
+	anc_data = CMSG_DATA(cmsg_hdr);
+	anc_data->flags = 0;
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->conn_id_length = conn_id_len;
+	err = sendmsg(self->sfd, &msg, 0);
+
+QUIC Tx offload in kernel will read the data from userspace, encrypt and
+copy it to the ciphertext within the same operation.
+
+
+Sending QUIC application data with GSO
+--------------------------------------
+When GSO is in use, the kernel will use the GSO fragment size as the target
+for ciphertext. The packets from the user space should align on the boundary
+of the fragment size minus the tag size for the chosen cipher. For example,
+if the fragment size is 1200 bytes and the tag size is 16 bytes, the plain
+packets should follow each other at every 1184 bytes. After the encryption,
+the rest of UDP and IP stacks will follow the defined value of the fragment,
+which includes the trailing tag bytes.
+
+To set up GSO fragmentation:
+
+.. code-block:: c
+
+	setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
+		   sizeof(frag_size));
+
+If the fragment size is provided in ancillary data within the ``sendmsg()``
+call, the value in ancillary data will take precedence over the segment size
+provided in setsockopt to split the payload into packets. This is consistent
+with the UDP stack behavior.
+
+Integrating to userspace QUIC libraries
+---------------------------------------
+
+Userspace QUIC libraries integration would depend on the implementation of the
+QUIC protocol. For MVFST library [#mvfst]_, the control plane is integrated
+into the handshake callbacks to properly configure the flows into the socket;
+and the data plane is integrated into the methods that perform encryption
+and send the packets to the batch scheduler for transmissions to the socket.
+
+Statistics
+==========
+
+QUIC Tx offload to the kernel has counters
+(``/proc/net/quic_stat``):
+
+- ``QuicCurrTxSw`` -
+  number of currently active kernel offloaded QUIC connections
+- ``QuicTxSw`` -
+  accumulative total number of offloaded QUIC connections
+- ``QuicTxSwError`` -
+  accumulative total number of errors during QUIC Tx offload to kernel
+
+References
+==========
+
+.. [#rfc9000] https://datatracker.ietf.org/doc/html/rfc9000
+.. [#mvfst] https://github.com/facebookincubator/mvfst
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [net-next v4 2/6] net: Define QUIC specific constants, control and data plane structures
  2022-09-09  0:12 ` [net-next v4 0/6] net: support QUIC crypto Adel Abouchaev
  2022-09-09  0:12   ` [net-next v4 1/6] net: Documentation on QUIC kernel Tx crypto Adel Abouchaev
@ 2022-09-09  0:12   ` Adel Abouchaev
  2022-09-09  0:12   ` [net-next v4 3/6] net: Add UDP ULP operations, initialization and handling prototype functions Adel Abouchaev
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-09-09  0:12 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, corbet, dsahern, shuah, paul,
	jmorris, serge, linux-security-module, netdev, linux-doc,
	linux-kselftest

Define control and data plane structures to pass in control plane for
flow add/remove and during packet send within ancillary data. Define
constants to use within SOL_UDP to program QUIC sockets.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>

---

v3: added a 3-tuple to map a flow to a key, added key generation to
include into flow context.
v4: added missing linux/in.h and linux/in6.h to the quic.h file.
---
 include/uapi/linux/quic.h | 68 +++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/udp.h  |  3 ++
 2 files changed, 71 insertions(+)
 create mode 100644 include/uapi/linux/quic.h

diff --git a/include/uapi/linux/quic.h b/include/uapi/linux/quic.h
new file mode 100644
index 000000000000..3a281a3037f0
--- /dev/null
+++ b/include/uapi/linux/quic.h
@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) */
+
+#ifndef _UAPI_LINUX_QUIC_H
+#define _UAPI_LINUX_QUIC_H
+
+#include <linux/in.h>
+#include <linux/in6.h>
+#include <linux/types.h>
+#include <linux/tls.h>
+
+#define QUIC_MAX_CONNECTION_ID_SIZE	20
+
+/* Side by side data for QUIC egress operations */
+#define QUIC_BYPASS_ENCRYPTION	0x01
+
+struct quic_tx_ancillary_data {
+	__aligned_u64	next_pkt_num;
+	__u8	flags;
+	__u8	dst_conn_id_length;
+};
+
+struct quic_connection_info_key {
+	__u8	dst_conn_id[QUIC_MAX_CONNECTION_ID_SIZE];
+	__u8	dst_conn_id_length;
+	union {
+		struct in6_addr ipv6_addr;
+		struct in_addr  ipv4_addr;
+	} addr;
+	__be16  udp_port;
+};
+
+struct quic_aes_gcm_128 {
+	__u8	header_key[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
+	__u8	payload_key[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
+	__u8	payload_iv[TLS_CIPHER_AES_GCM_128_IV_SIZE];
+};
+
+struct quic_aes_gcm_256 {
+	__u8	header_key[TLS_CIPHER_AES_GCM_256_KEY_SIZE];
+	__u8	payload_key[TLS_CIPHER_AES_GCM_256_KEY_SIZE];
+	__u8	payload_iv[TLS_CIPHER_AES_GCM_256_IV_SIZE];
+};
+
+struct quic_aes_ccm_128 {
+	__u8	header_key[TLS_CIPHER_AES_CCM_128_KEY_SIZE];
+	__u8	payload_key[TLS_CIPHER_AES_CCM_128_KEY_SIZE];
+	__u8	payload_iv[TLS_CIPHER_AES_CCM_128_IV_SIZE];
+};
+
+struct quic_chacha20_poly1305 {
+	__u8	header_key[TLS_CIPHER_CHACHA20_POLY1305_KEY_SIZE];
+	__u8	payload_key[TLS_CIPHER_CHACHA20_POLY1305_KEY_SIZE];
+	__u8	payload_iv[TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE];
+};
+
+struct quic_connection_info {
+	__u16	cipher_type;
+	struct quic_connection_info_key		key;
+	__u8	conn_payload_key_gen;
+	union {
+		struct quic_aes_gcm_128 aes_gcm_128;
+		struct quic_aes_gcm_256 aes_gcm_256;
+		struct quic_aes_ccm_128 aes_ccm_128;
+		struct quic_chacha20_poly1305 chacha20_poly1305;
+	};
+};
+
+#endif
diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
index 4828794efcf8..0ee4c598e70b 100644
--- a/include/uapi/linux/udp.h
+++ b/include/uapi/linux/udp.h
@@ -34,6 +34,9 @@ struct udphdr {
 #define UDP_NO_CHECK6_RX 102	/* Disable accpeting checksum for UDP6 */
 #define UDP_SEGMENT	103	/* Set GSO segmentation size */
 #define UDP_GRO		104	/* This socket can receive UDP GRO packets */
+#define UDP_QUIC_ADD_TX_CONNECTION	106 /* Add QUIC Tx crypto offload */
+#define UDP_QUIC_DEL_TX_CONNECTION	107 /* Del QUIC Tx crypto offload */
+#define UDP_QUIC_ENCRYPT		108 /* QUIC encryption parameters */
 
 /* UDP encapsulation types */
 #define UDP_ENCAP_ESPINUDP_NON_IKE	1 /* draft-ietf-ipsec-nat-t-ike-00/01 */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [net-next v4 3/6] net: Add UDP ULP operations, initialization and handling prototype functions.
  2022-09-09  0:12 ` [net-next v4 0/6] net: support QUIC crypto Adel Abouchaev
  2022-09-09  0:12   ` [net-next v4 1/6] net: Documentation on QUIC kernel Tx crypto Adel Abouchaev
  2022-09-09  0:12   ` [net-next v4 2/6] net: Define QUIC specific constants, control and data plane structures Adel Abouchaev
@ 2022-09-09  0:12   ` Adel Abouchaev
  2022-09-09  0:12   ` [net-next v4 4/6] net: Implement QUIC offload functions Adel Abouchaev
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-09-09  0:12 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, corbet, dsahern, shuah, paul,
	jmorris, serge, linux-security-module, netdev, linux-doc,
	linux-kselftest

Define functions to add UDP ULP handling, registration with UDP protocol
and supporting data structures. Create structure for QUIC ULP and add empty
prototype functions to support it.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>

---

Removed reference to net/quic/Kconfig from this patch into the next.

Fixed formatting around brackets.
---
 include/net/inet_sock.h  |   2 +
 include/net/udp.h        |  33 +++++++
 include/uapi/linux/udp.h |   1 +
 net/Makefile             |   1 +
 net/ipv4/Makefile        |   3 +-
 net/ipv4/udp.c           |   6 ++
 net/ipv4/udp_ulp.c       | 192 +++++++++++++++++++++++++++++++++++++++
 7 files changed, 237 insertions(+), 1 deletion(-)
 create mode 100644 net/ipv4/udp_ulp.c

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index bf5654ce711e..650e332bdb50 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -249,6 +249,8 @@ struct inet_sock {
 	__be32			mc_addr;
 	struct ip_mc_socklist __rcu	*mc_list;
 	struct inet_cork_full	cork;
+	const struct udp_ulp_ops	*udp_ulp_ops;
+	void __rcu		*ulp_data;
 };
 
 #define IPCORK_OPT	1	/* ip-options has been held in ipcork.opt */
diff --git a/include/net/udp.h b/include/net/udp.h
index 5ee88ddf79c3..f22ebabbb186 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -523,4 +523,37 @@ struct proto *udp_bpf_get_proto(struct sock *sk, struct sk_psock *psock);
 int udp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore);
 #endif
 
+/*
+ * Interface for adding Upper Level Protocols over UDP
+ */
+
+#define UDP_ULP_NAME_MAX	16
+#define UDP_ULP_MAX		128
+
+struct udp_ulp_ops {
+	struct list_head	list;
+
+	/* initialize ulp */
+	int (*init)(struct sock *sk);
+	/* cleanup ulp */
+	void (*release)(struct sock *sk);
+
+	char		name[UDP_ULP_NAME_MAX];
+	struct module	*owner;
+};
+
+int udp_register_ulp(struct udp_ulp_ops *type);
+void udp_unregister_ulp(struct udp_ulp_ops *type);
+int udp_set_ulp(struct sock *sk, const char *name);
+void udp_get_available_ulp(char *buf, size_t len);
+void udp_cleanup_ulp(struct sock *sk);
+int udp_setsockopt_ulp(struct sock *sk, sockptr_t optval,
+		       unsigned int optlen);
+int udp_getsockopt_ulp(struct sock *sk, char __user *optval,
+		       int __user *optlen);
+
+#define MODULE_ALIAS_UDP_ULP(name)\
+	__MODULE_INFO(alias, alias_userspace, name);\
+	__MODULE_INFO(alias, alias_udp_ulp, "udp-ulp-" name)
+
 #endif	/* _UDP_H */
diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
index 0ee4c598e70b..893691f0108a 100644
--- a/include/uapi/linux/udp.h
+++ b/include/uapi/linux/udp.h
@@ -34,6 +34,7 @@ struct udphdr {
 #define UDP_NO_CHECK6_RX 102	/* Disable accpeting checksum for UDP6 */
 #define UDP_SEGMENT	103	/* Set GSO segmentation size */
 #define UDP_GRO		104	/* This socket can receive UDP GRO packets */
+#define UDP_ULP		105	/* Attach ULP to a UDP socket */
 #define UDP_QUIC_ADD_TX_CONNECTION	106 /* Add QUIC Tx crypto offload */
 #define UDP_QUIC_DEL_TX_CONNECTION	107 /* Del QUIC Tx crypto offload */
 #define UDP_QUIC_ENCRYPT		108 /* QUIC encryption parameters */
diff --git a/net/Makefile b/net/Makefile
index 6a62e5b27378..021ea3698d3a 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -16,6 +16,7 @@ obj-y				+= ethernet/ 802/ sched/ netlink/ bpf/ ethtool/
 obj-$(CONFIG_NETFILTER)		+= netfilter/
 obj-$(CONFIG_INET)		+= ipv4/
 obj-$(CONFIG_TLS)		+= tls/
+obj-$(CONFIG_QUIC)		+= quic/
 obj-$(CONFIG_XFRM)		+= xfrm/
 obj-$(CONFIG_UNIX_SCM)		+= unix/
 obj-y				+= ipv6/
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index bbdd9c44f14e..88d3baf4af95 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -14,7 +14,8 @@ obj-y     := route.o inetpeer.o protocol.o \
 	     udp_offload.o arp.o icmp.o devinet.o af_inet.o igmp.o \
 	     fib_frontend.o fib_semantics.o fib_trie.o fib_notifier.o \
 	     inet_fragment.o ping.o ip_tunnel_core.o gre_offload.o \
-	     metrics.o netlink.o nexthop.o udp_tunnel_stub.o
+	     metrics.o netlink.o nexthop.o udp_tunnel_stub.o \
+	     udp_ulp.o
 
 obj-$(CONFIG_BPFILTER) += bpfilter/
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index cd72158e953a..0f5c842dbd3f 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2781,6 +2781,9 @@ int udp_lib_setsockopt(struct sock *sk, int level, int optname,
 		up->pcflag |= UDPLITE_RECV_CC;
 		break;
 
+	case UDP_ULP:
+		return udp_setsockopt_ulp(sk, optval, optlen);
+
 	default:
 		err = -ENOPROTOOPT;
 		break;
@@ -2849,6 +2852,9 @@ int udp_lib_getsockopt(struct sock *sk, int level, int optname,
 		val = up->pcrlen;
 		break;
 
+	case UDP_ULP:
+		return udp_getsockopt_ulp(sk, optval, optlen);
+
 	default:
 		return -ENOPROTOOPT;
 	}
diff --git a/net/ipv4/udp_ulp.c b/net/ipv4/udp_ulp.c
new file mode 100644
index 000000000000..138818690151
--- /dev/null
+++ b/net/ipv4/udp_ulp.c
@@ -0,0 +1,192 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Pluggable UDP upper layer protocol support, based on pluggable TCP upper
+ * layer protocol support.
+ *
+ * Copyright (c) 2016-2017, Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2016-2017, Dave Watson <davejwatson@fb.com>. All rights
+ * reserved.
+ * Copyright (c) 2021-2022, Meta Platforms, Inc. All rights reserved.
+ */
+
+#include <linux/gfp.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/mm.h>
+#include <linux/types.h>
+#include <linux/skmsg.h>
+#include <net/tcp.h>
+#include <net/udp.h>
+
+static DEFINE_SPINLOCK(udp_ulp_list_lock);
+static LIST_HEAD(udp_ulp_list);
+
+/* Simple linear search, don't expect many entries! */
+static struct udp_ulp_ops *udp_ulp_find(const char *name)
+{
+	struct udp_ulp_ops *e;
+
+	list_for_each_entry_rcu(e, &udp_ulp_list, list,
+				lockdep_is_held(&udp_ulp_list_lock)) {
+		if (strcmp(e->name, name) == 0)
+			return e;
+	}
+
+	return NULL;
+}
+
+static const struct udp_ulp_ops *__udp_ulp_find_autoload(const char *name)
+{
+	const struct udp_ulp_ops *ulp = NULL;
+
+	rcu_read_lock();
+	ulp = udp_ulp_find(name);
+
+#ifdef CONFIG_MODULES
+	if (!ulp && capable(CAP_NET_ADMIN)) {
+		rcu_read_unlock();
+		request_module("udp-ulp-%s", name);
+		rcu_read_lock();
+		ulp = udp_ulp_find(name);
+	}
+#endif
+	if (!ulp || !try_module_get(ulp->owner))
+		ulp = NULL;
+
+	rcu_read_unlock();
+	return ulp;
+}
+
+/* Attach new upper layer protocol to the list
+ * of available protocols.
+ */
+int udp_register_ulp(struct udp_ulp_ops *ulp)
+{
+	int ret = 0;
+
+	spin_lock(&udp_ulp_list_lock);
+	if (udp_ulp_find(ulp->name))
+		ret = -EEXIST;
+	else
+		list_add_tail_rcu(&ulp->list, &udp_ulp_list);
+
+	spin_unlock(&udp_ulp_list_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(udp_register_ulp);
+
+void udp_unregister_ulp(struct udp_ulp_ops *ulp)
+{
+	spin_lock(&udp_ulp_list_lock);
+	list_del_rcu(&ulp->list);
+	spin_unlock(&udp_ulp_list_lock);
+
+	synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(udp_unregister_ulp);
+
+void udp_cleanup_ulp(struct sock *sk)
+{
+	struct inet_sock *inet = inet_sk(sk);
+
+	/* No sock_owned_by_me() check here as at the time the
+	 * stack calls this function, the socket is dead and
+	 * about to be destroyed.
+	 */
+	if (!inet->udp_ulp_ops)
+		return;
+
+	if (inet->udp_ulp_ops->release)
+		inet->udp_ulp_ops->release(sk);
+	module_put(inet->udp_ulp_ops->owner);
+
+	inet->udp_ulp_ops = NULL;
+}
+
+static int __udp_set_ulp(struct sock *sk, const struct udp_ulp_ops *ulp_ops)
+{
+	struct inet_sock *inet = inet_sk(sk);
+	int err;
+
+	err = -EEXIST;
+	if (inet->udp_ulp_ops)
+		goto out_err;
+
+	err = ulp_ops->init(sk);
+	if (err)
+		goto out_err;
+
+	inet->udp_ulp_ops = ulp_ops;
+	return 0;
+
+out_err:
+	module_put(ulp_ops->owner);
+	return err;
+}
+
+int udp_set_ulp(struct sock *sk, const char *name)
+{
+	struct sk_psock *psock = sk_psock_get(sk);
+	const struct udp_ulp_ops *ulp_ops;
+
+	if (psock) {
+		sk_psock_put(sk, psock);
+		return -EINVAL;
+	}
+
+	sock_owned_by_me(sk);
+	ulp_ops = __udp_ulp_find_autoload(name);
+	if (!ulp_ops)
+		return -ENOENT;
+
+	return __udp_set_ulp(sk, ulp_ops);
+}
+
+int udp_setsockopt_ulp(struct sock *sk, sockptr_t optval, unsigned int optlen)
+{
+	char name[UDP_ULP_NAME_MAX];
+	int val, err;
+
+	if (!optlen || optlen > UDP_ULP_NAME_MAX)
+		return -EINVAL;
+
+	val = strncpy_from_sockptr(name, optval, optlen);
+	if (val < 0)
+		return -EFAULT;
+
+	if (val == UDP_ULP_NAME_MAX)
+		return -EINVAL;
+
+	name[val] = 0;
+	lock_sock(sk);
+	err = udp_set_ulp(sk, name);
+	release_sock(sk);
+	return err;
+}
+
+int udp_getsockopt_ulp(struct sock *sk, char __user *optval, int __user *optlen)
+{
+	struct inet_sock *inet = inet_sk(sk);
+	int len;
+
+	if (get_user(len, optlen))
+		return -EFAULT;
+
+	len = min_t(unsigned int, len, UDP_ULP_NAME_MAX);
+	if (len < 0)
+		return -EINVAL;
+
+	if (!inet->udp_ulp_ops) {
+		if (put_user(0, optlen))
+			return -EFAULT;
+		return 0;
+	}
+
+	if (put_user(len, optlen))
+		return -EFAULT;
+	if (copy_to_user(optval, inet->udp_ulp_ops->name, len))
+		return -EFAULT;
+
+	return 0;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [net-next v4 4/6] net: Implement QUIC offload functions
  2022-09-09  0:12 ` [net-next v4 0/6] net: support QUIC crypto Adel Abouchaev
                     ` (2 preceding siblings ...)
  2022-09-09  0:12   ` [net-next v4 3/6] net: Add UDP ULP operations, initialization and handling prototype functions Adel Abouchaev
@ 2022-09-09  0:12   ` Adel Abouchaev
  2022-09-09  0:12   ` [net-next v4 5/6] net: Add flow counters and Tx processing error counter Adel Abouchaev
  2022-09-09  0:12   ` [net-next v4 6/6] net: Add self tests for ULP operations, flow setup and crypto tests Adel Abouchaev
  5 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-09-09  0:12 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, corbet, dsahern, shuah, paul,
	jmorris, serge, linux-security-module, netdev, linux-doc,
	linux-kselftest

Add connection hash to the context do support add, remove operations
on QUIC connections for the control plane and lookup for the data
plane. Implement setsockopt and add placeholders to add and delete Tx
connections.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>

---

Added net/quic/Kconfig reference to net/Kconfig in this commit.

Initialized pointers with NULL vs 0. Restricted AES counter to __le32
Added address space qualifiers to user space addresses. Removed empty
lines. Updated code alignment. Removed inlines.

v3: removed ITER_KVEC flag from iov_iter_kvec call.
v3: fixed Chacha20 encryption bug.
v3: updated to match the uAPI struct fields
v3: updated Tx flow to match on dst ip, dst port and connection id.
v3: updated to drop packets if key generations do not match.
v4: made the discovered packet size signed size_t.
v4: exported security_socket_sendmsg from security/security.c.
---
 include/net/quic.h   |   53 ++
 net/Kconfig          |    1 +
 net/ipv4/udp.c       |    9 +
 net/quic/Kconfig     |   16 +
 net/quic/Makefile    |    8 +
 net/quic/quic_main.c | 1487 ++++++++++++++++++++++++++++++++++++++++++
 security/security.c  |    1 +
 7 files changed, 1575 insertions(+)
 create mode 100644 include/net/quic.h
 create mode 100644 net/quic/Kconfig
 create mode 100644 net/quic/Makefile
 create mode 100644 net/quic/quic_main.c

diff --git a/include/net/quic.h b/include/net/quic.h
new file mode 100644
index 000000000000..cafe01174e60
--- /dev/null
+++ b/include/net/quic.h
@@ -0,0 +1,53 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifndef INCLUDE_NET_QUIC_H
+#define INCLUDE_NET_QUIC_H
+
+#include <linux/mutex.h>
+#include <linux/rhashtable.h>
+#include <linux/skmsg.h>
+#include <uapi/linux/quic.h>
+
+#define QUIC_MAX_SHORT_HEADER_SIZE      25
+#define QUIC_MAX_CONNECTION_ID_SIZE     20
+#define QUIC_HDR_MASK_SIZE              16
+#define QUIC_MAX_GSO_FRAGS              16
+
+// Maximum IV and nonce sizes should be in sync with supported ciphers.
+#define QUIC_CIPHER_MAX_IV_SIZE		12
+#define QUIC_CIPHER_MAX_NONCE_SIZE	16
+
+/* Side by side data for QUIC egress operations */
+#define QUIC_ANCILLARY_FLAGS    (QUIC_BYPASS_ENCRYPTION)
+
+#define QUIC_MAX_IOVEC_SEGMENTS		8
+#define QUIC_MAX_SG_ALLOC_ELEMENTS	32
+#define QUIC_MAX_PLAIN_PAGES		16
+#define QUIC_MAX_CIPHER_PAGES_ORDER	4
+
+struct quic_internal_crypto_context {
+	struct quic_connection_info	conn_info;
+	struct crypto_skcipher		*header_tfm;
+	struct crypto_aead		*packet_aead;
+};
+
+struct quic_connection_rhash {
+	struct rhash_head			node;
+	struct quic_internal_crypto_context	crypto_ctx;
+	struct rcu_head				rcu;
+};
+
+struct quic_context {
+	struct proto		*sk_proto;
+	struct rhashtable	tx_connections;
+	struct scatterlist	sg_alloc[QUIC_MAX_SG_ALLOC_ELEMENTS];
+	struct page		*cipher_page;
+	/**
+	 * To synchronize concurrent sendmsg() requests through the same socket
+	 * and protect preallocated per-context memory.
+	 **/
+	struct mutex		sendmsg_mux;
+	struct rcu_head		rcu;
+};
+
+#endif
diff --git a/net/Kconfig b/net/Kconfig
index 48c33c222199..6824d07b9e57 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -63,6 +63,7 @@ menu "Networking options"
 source "net/packet/Kconfig"
 source "net/unix/Kconfig"
 source "net/tls/Kconfig"
+source "net/quic/Kconfig"
 source "net/xfrm/Kconfig"
 source "net/iucv/Kconfig"
 source "net/smc/Kconfig"
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 0f5c842dbd3f..4de5fde9a291 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -113,6 +113,7 @@
 #include <net/sock_reuseport.h>
 #include <net/addrconf.h>
 #include <net/udp_tunnel.h>
+#include <uapi/linux/quic.h>
 #if IS_ENABLED(CONFIG_IPV6)
 #include <net/ipv6_stubs.h>
 #endif
@@ -1013,6 +1014,14 @@ static int __udp_cmsg_send(struct cmsghdr *cmsg, u16 *gso_size)
 			return -EINVAL;
 		*gso_size = *(__u16 *)CMSG_DATA(cmsg);
 		return 0;
+	case UDP_QUIC_ENCRYPT:
+		/* This option is handled in UDP_ULP and is only checked
+		 * here for the bypass bit
+		 */
+		if (cmsg->cmsg_len !=
+		    CMSG_LEN(sizeof(struct quic_tx_ancillary_data)))
+			return -EINVAL;
+		return 0;
 	default:
 		return -EINVAL;
 	}
diff --git a/net/quic/Kconfig b/net/quic/Kconfig
new file mode 100644
index 000000000000..661cb989508a
--- /dev/null
+++ b/net/quic/Kconfig
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# QUIC configuration
+#
+config QUIC
+	tristate "QUIC encryption offload"
+	depends on INET
+	select CRYPTO
+	select CRYPTO_AES
+	select CRYPTO_GCM
+	help
+	Enable kernel support for QUIC crypto offload. Currently only TX
+	encryption offload is supported. The kernel will perform
+	copy-during-encryption.
+
+	If unsure, say N.
diff --git a/net/quic/Makefile b/net/quic/Makefile
new file mode 100644
index 000000000000..928239c4d08c
--- /dev/null
+++ b/net/quic/Makefile
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for the QUIC subsystem
+#
+
+obj-$(CONFIG_QUIC) += quic.o
+
+quic-y := quic_main.o
diff --git a/net/quic/quic_main.c b/net/quic/quic_main.c
new file mode 100644
index 000000000000..32535f7b7f11
--- /dev/null
+++ b/net/quic/quic_main.c
@@ -0,0 +1,1487 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <crypto/skcipher.h>
+#include <linux/bug.h>
+#include <linux/module.h>
+#include <linux/rhashtable.h>
+// Include header to use TLS constants for AEAD cipher.
+#include <net/tls.h>
+#include <net/quic.h>
+#include <net/udp.h>
+#include <uapi/linux/quic.h>
+
+static unsigned long af_init_done;
+static struct proto quic_v4_proto;
+static struct proto quic_v6_proto;
+static DEFINE_SPINLOCK(quic_proto_lock);
+
+static u32 quic_tx_connection_hash(const void *data, u32 len, u32 seed)
+{
+	return jhash(data, len, seed);
+}
+
+static u32 quic_tx_connection_hash_obj(const void *data, u32 len, u32 seed)
+{
+	const struct quic_connection_rhash *connhash = data;
+
+	return jhash(&connhash->crypto_ctx.conn_info.key,
+		     sizeof(struct quic_connection_info_key), seed);
+}
+
+static int quic_tx_connection_hash_cmp(struct rhashtable_compare_arg *arg,
+				       const void *ptr)
+{
+	const struct quic_connection_info_key *key = arg->key;
+	const struct quic_connection_rhash *x = ptr;
+
+	return !!memcmp(&x->crypto_ctx.conn_info.key,
+			key,
+			sizeof(struct quic_connection_info_key));
+}
+
+static const struct rhashtable_params quic_tx_connection_params = {
+	.key_len		= sizeof(struct quic_connection_info_key),
+	.head_offset		= offsetof(struct quic_connection_rhash, node),
+	.hashfn			= quic_tx_connection_hash,
+	.obj_hashfn		= quic_tx_connection_hash_obj,
+	.obj_cmpfn		= quic_tx_connection_hash_cmp,
+	.automatic_shrinking	= true,
+};
+
+static size_t quic_crypto_key_size(u16 cipher_type)
+{
+	switch (cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		return TLS_CIPHER_AES_GCM_128_KEY_SIZE;
+	case TLS_CIPHER_AES_GCM_256:
+		return TLS_CIPHER_AES_GCM_256_KEY_SIZE;
+	case TLS_CIPHER_AES_CCM_128:
+		return TLS_CIPHER_AES_CCM_128_KEY_SIZE;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		return TLS_CIPHER_CHACHA20_POLY1305_KEY_SIZE;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return 0;
+}
+
+static size_t quic_crypto_tag_size(u16 cipher_type)
+{
+	switch (cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		return TLS_CIPHER_AES_GCM_128_TAG_SIZE;
+	case TLS_CIPHER_AES_GCM_256:
+		return TLS_CIPHER_AES_GCM_256_TAG_SIZE;
+	case TLS_CIPHER_AES_CCM_128:
+		return TLS_CIPHER_AES_CCM_128_TAG_SIZE;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		return TLS_CIPHER_CHACHA20_POLY1305_TAG_SIZE;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return 0;
+}
+
+static size_t quic_crypto_nonce_size(u16 cipher_type)
+{
+	switch (cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		BUILD_BUG_ON(TLS_CIPHER_AES_GCM_128_IV_SIZE +
+			     TLS_CIPHER_AES_GCM_128_SALT_SIZE >
+			     QUIC_CIPHER_MAX_NONCE_SIZE);
+		return TLS_CIPHER_AES_GCM_128_IV_SIZE +
+		       TLS_CIPHER_AES_GCM_128_SALT_SIZE;
+	case TLS_CIPHER_AES_GCM_256:
+		BUILD_BUG_ON(TLS_CIPHER_AES_GCM_256_IV_SIZE +
+			     TLS_CIPHER_AES_GCM_256_SALT_SIZE >
+			     QUIC_CIPHER_MAX_NONCE_SIZE);
+		return TLS_CIPHER_AES_GCM_256_IV_SIZE +
+		       TLS_CIPHER_AES_GCM_256_SALT_SIZE;
+	case TLS_CIPHER_AES_CCM_128:
+		BUILD_BUG_ON(TLS_CIPHER_AES_CCM_128_IV_SIZE +
+			     TLS_CIPHER_AES_CCM_128_SALT_SIZE >
+			     QUIC_CIPHER_MAX_NONCE_SIZE);
+		return TLS_CIPHER_AES_CCM_128_IV_SIZE +
+		       TLS_CIPHER_AES_CCM_128_SALT_SIZE;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		BUILD_BUG_ON(TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE +
+			     TLS_CIPHER_CHACHA20_POLY1305_SALT_SIZE >
+			     QUIC_CIPHER_MAX_NONCE_SIZE);
+		return TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE +
+		       TLS_CIPHER_CHACHA20_POLY1305_SALT_SIZE;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return 0;
+}
+
+static u8 *quic_payload_iv(struct quic_internal_crypto_context *crypto_ctx)
+{
+	switch (crypto_ctx->conn_info.cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		return crypto_ctx->conn_info.aes_gcm_128.payload_iv;
+	case TLS_CIPHER_AES_GCM_256:
+		return crypto_ctx->conn_info.aes_gcm_256.payload_iv;
+	case TLS_CIPHER_AES_CCM_128:
+		return crypto_ctx->conn_info.aes_ccm_128.payload_iv;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		return crypto_ctx->conn_info.chacha20_poly1305.payload_iv;
+	default:
+		break;
+	}
+	WARN_ON("Unsupported cipher type");
+	return NULL;
+}
+
+static int
+quic_config_header_crypto(struct quic_internal_crypto_context *crypto_ctx)
+{
+	struct crypto_skcipher *tfm;
+	char *header_cipher;
+	int rc = 0;
+	char *key;
+
+	switch (crypto_ctx->conn_info.cipher_type) {
+	case TLS_CIPHER_AES_GCM_128:
+		header_cipher = "ecb(aes)";
+		key = crypto_ctx->conn_info.aes_gcm_128.header_key;
+		break;
+	case TLS_CIPHER_AES_GCM_256:
+		header_cipher = "ecb(aes)";
+		key = crypto_ctx->conn_info.aes_gcm_256.header_key;
+		break;
+	case TLS_CIPHER_AES_CCM_128:
+		header_cipher = "ecb(aes)";
+		key = crypto_ctx->conn_info.aes_ccm_128.header_key;
+		break;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		header_cipher = "chacha20";
+		key = crypto_ctx->conn_info.chacha20_poly1305.header_key;
+		break;
+	default:
+		rc = -EINVAL;
+		goto out;
+	}
+
+	tfm = crypto_alloc_skcipher(header_cipher, 0, 0);
+	if (IS_ERR(tfm)) {
+		rc = PTR_ERR(tfm);
+		goto out;
+	}
+
+	rc = crypto_skcipher_setkey(tfm, key,
+				    quic_crypto_key_size(crypto_ctx->conn_info
+							 .cipher_type));
+	if (rc) {
+		crypto_free_skcipher(tfm);
+		goto out;
+	}
+
+	crypto_ctx->header_tfm = tfm;
+
+out:
+	return rc;
+}
+
+static int
+quic_config_packet_crypto(struct quic_internal_crypto_context *crypto_ctx)
+{
+	struct crypto_aead *aead;
+	char *cipher_name;
+	int rc = 0;
+	char *key;
+
+	switch (crypto_ctx->conn_info.cipher_type) {
+	case TLS_CIPHER_AES_GCM_128: {
+		key = crypto_ctx->conn_info.aes_gcm_128.payload_key;
+		cipher_name = "gcm(aes)";
+		break;
+	}
+	case TLS_CIPHER_AES_GCM_256: {
+		key = crypto_ctx->conn_info.aes_gcm_256.payload_key;
+		cipher_name = "gcm(aes)";
+		break;
+	}
+	case TLS_CIPHER_AES_CCM_128: {
+		key = crypto_ctx->conn_info.aes_ccm_128.payload_key;
+		cipher_name = "ccm(aes)";
+		break;
+	}
+	case TLS_CIPHER_CHACHA20_POLY1305: {
+		key = crypto_ctx->conn_info.chacha20_poly1305.payload_key;
+		cipher_name = "rfc7539(chacha20,poly1305)";
+		break;
+	}
+	default:
+		rc = -EINVAL;
+		goto out;
+	}
+
+	aead = crypto_alloc_aead(cipher_name, 0, 0);
+	if (IS_ERR(aead)) {
+		rc = PTR_ERR(aead);
+		goto out;
+	}
+
+	rc = crypto_aead_setkey(aead, key,
+				quic_crypto_key_size(crypto_ctx->conn_info
+						     .cipher_type));
+	if (rc)
+		goto free_aead;
+
+	rc = crypto_aead_setauthsize(aead,
+				     quic_crypto_tag_size(crypto_ctx->conn_info
+							  .cipher_type));
+	if (rc)
+		goto free_aead;
+
+	crypto_ctx->packet_aead = aead;
+	goto out;
+
+free_aead:
+	crypto_free_aead(aead);
+
+out:
+	return rc;
+}
+
+static inline struct quic_context *quic_get_ctx(struct sock *sk)
+{
+	struct inet_sock *inet = inet_sk(sk);
+
+	return (__force void *)rcu_access_pointer(inet->ulp_data);
+}
+
+static void quic_free_cipher_page(struct page *page)
+{
+	__free_pages(page, QUIC_MAX_CIPHER_PAGES_ORDER);
+}
+
+static struct quic_context *quic_ctx_create(void)
+{
+	struct quic_context *ctx;
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return NULL;
+
+	mutex_init(&ctx->sendmsg_mux);
+	ctx->cipher_page = alloc_pages(GFP_KERNEL, QUIC_MAX_CIPHER_PAGES_ORDER);
+	if (!ctx->cipher_page)
+		goto out_err;
+
+	if (rhashtable_init(&ctx->tx_connections,
+			    &quic_tx_connection_params) < 0) {
+		quic_free_cipher_page(ctx->cipher_page);
+		goto out_err;
+	}
+
+	return ctx;
+
+out_err:
+	kfree(ctx);
+	return NULL;
+}
+
+static int quic_getsockopt(struct sock *sk, int level, int optname,
+			   char __user *optval, int __user *optlen)
+{
+	struct quic_context *ctx = quic_get_ctx(sk);
+
+	return ctx->sk_proto->getsockopt(sk, level, optname, optval, optlen);
+}
+
+static void quic_update_key_if_mapped_ipv4(struct quic_connection_info_key *key)
+{
+	if (ipv6_addr_v4mapped(&key->addr.ipv6_addr)) {
+		key->addr.ipv6_addr.s6_addr32[0] =
+			key->addr.ipv6_addr.s6_addr32[3];
+		key->addr.ipv6_addr.s6_addr32[1] = 0;
+		key->addr.ipv6_addr.s6_addr32[2] = 0;
+		key->addr.ipv6_addr.s6_addr32[3] = 0;
+	}
+}
+
+static int do_quic_conn_add_tx(struct sock *sk, sockptr_t optval,
+			       unsigned int optlen)
+{
+	struct quic_internal_crypto_context *crypto_ctx;
+	struct quic_context *ctx = quic_get_ctx(sk);
+	struct quic_connection_rhash *connhash;
+	int rc = 0;
+
+	if (sockptr_is_null(optval))
+		return -EINVAL;
+
+	if (optlen != sizeof(struct quic_connection_info))
+		return -EINVAL;
+
+	connhash = kzalloc(sizeof(*connhash), GFP_KERNEL);
+	if (!connhash)
+		return -EFAULT;
+
+	crypto_ctx = &connhash->crypto_ctx;
+	rc = copy_from_sockptr(&crypto_ctx->conn_info, optval,
+			       sizeof(crypto_ctx->conn_info));
+	if (rc) {
+		rc = -EFAULT;
+		goto err_crypto_info;
+	}
+
+	quic_update_key_if_mapped_ipv4(&crypto_ctx->conn_info.key);
+
+	if (crypto_ctx->conn_info.key.dst_conn_id_length >
+	    QUIC_MAX_CONNECTION_ID_SIZE) {
+		rc = -EINVAL;
+		goto err_crypto_info;
+	}
+
+	if (crypto_ctx->conn_info.conn_payload_key_gen > 1) {
+		rc = -EINVAL;
+		goto err_crypto_info;
+	}
+
+	// create all TLS materials for packet and header decryption
+	rc = quic_config_header_crypto(crypto_ctx);
+	if (rc)
+		goto err_crypto_info;
+
+	rc = quic_config_packet_crypto(crypto_ctx);
+	if (rc)
+		goto err_free_skcipher;
+
+	// insert crypto data into hash per connection ID
+	rc = rhashtable_insert_fast(&ctx->tx_connections, &connhash->node,
+				    quic_tx_connection_params);
+	if (rc < 0)
+		goto err_free_ciphers;
+
+	return 0;
+
+err_free_ciphers:
+	crypto_free_aead(crypto_ctx->packet_aead);
+
+err_free_skcipher:
+	crypto_free_skcipher(crypto_ctx->header_tfm);
+
+err_crypto_info:
+	// wipeout all crypto materials;
+	memzero_explicit(&connhash->crypto_ctx, sizeof(connhash->crypto_ctx));
+	kfree(connhash);
+	return rc;
+}
+
+static int do_quic_conn_del_tx(struct sock *sk, sockptr_t optval,
+			       unsigned int optlen)
+{
+	struct quic_internal_crypto_context *crypto_ctx;
+	struct quic_context *ctx = quic_get_ctx(sk);
+	struct quic_connection_rhash *connhash;
+	struct quic_connection_info conn_info;
+
+	if (sockptr_is_null(optval))
+		return -EINVAL;
+
+	if (optlen != sizeof(struct quic_connection_info))
+		return -EINVAL;
+
+	if (copy_from_sockptr(&conn_info, optval, optlen))
+		return -EFAULT;
+
+	if (conn_info.key.dst_conn_id_length >
+	    QUIC_MAX_CONNECTION_ID_SIZE)
+		return -EINVAL;
+
+	if (conn_info.conn_payload_key_gen > 1)
+		return -EINVAL;
+
+	quic_update_key_if_mapped_ipv4(&conn_info.key);
+
+	connhash = rhashtable_lookup_fast(&ctx->tx_connections,
+					  &conn_info.key,
+					  quic_tx_connection_params);
+	if (!connhash)
+		return -EINVAL;
+
+	rhashtable_remove_fast(&ctx->tx_connections,
+			       &connhash->node,
+			       quic_tx_connection_params);
+
+	crypto_ctx = &connhash->crypto_ctx;
+
+	crypto_free_skcipher(crypto_ctx->header_tfm);
+	crypto_free_aead(crypto_ctx->packet_aead);
+	memzero_explicit(crypto_ctx, sizeof(*crypto_ctx));
+	kfree(connhash);
+
+	return 0;
+}
+
+static int do_quic_setsockopt(struct sock *sk, int optname, sockptr_t optval,
+			      unsigned int optlen)
+{
+	int rc = 0;
+
+	switch (optname) {
+	case UDP_QUIC_ADD_TX_CONNECTION:
+		lock_sock(sk);
+		rc = do_quic_conn_add_tx(sk, optval, optlen);
+		release_sock(sk);
+		break;
+	case UDP_QUIC_DEL_TX_CONNECTION:
+		lock_sock(sk);
+		rc = do_quic_conn_del_tx(sk, optval, optlen);
+		release_sock(sk);
+		break;
+	default:
+		rc = -ENOPROTOOPT;
+		break;
+	}
+
+	return rc;
+}
+
+static int quic_setsockopt(struct sock *sk, int level, int optname,
+			   sockptr_t optval, unsigned int optlen)
+{
+	struct quic_context *ctx;
+	struct proto *sk_proto;
+
+	rcu_read_lock();
+	ctx = quic_get_ctx(sk);
+	sk_proto = ctx->sk_proto;
+	rcu_read_unlock();
+
+	if (level == SOL_UDP &&
+	    (optname == UDP_QUIC_ADD_TX_CONNECTION ||
+	     optname == UDP_QUIC_DEL_TX_CONNECTION))
+		return do_quic_setsockopt(sk, optname, optval, optlen);
+
+	return sk_proto->setsockopt(sk, level, optname, optval, optlen);
+}
+
+static int
+quic_extract_ancillary_data(struct msghdr *msg,
+			    struct quic_tx_ancillary_data *ancillary_data,
+			    u16 *udp_pkt_size)
+{
+	struct cmsghdr *cmsg_hdr = NULL;
+	void *ancillary_data_ptr = NULL;
+
+	if (!msg->msg_controllen)
+		return -EINVAL;
+
+	for_each_cmsghdr(cmsg_hdr, msg) {
+		if (!CMSG_OK(msg, cmsg_hdr))
+			return -EINVAL;
+
+		if (cmsg_hdr->cmsg_level != IPPROTO_UDP)
+			continue;
+
+		if (cmsg_hdr->cmsg_type == UDP_QUIC_ENCRYPT) {
+			if (cmsg_hdr->cmsg_len !=
+			    CMSG_LEN(sizeof(struct quic_tx_ancillary_data)))
+				return -EINVAL;
+			memcpy((void *)ancillary_data, CMSG_DATA(cmsg_hdr),
+			       sizeof(struct quic_tx_ancillary_data));
+			ancillary_data_ptr = cmsg_hdr;
+		} else if (cmsg_hdr->cmsg_type == UDP_SEGMENT) {
+			if (cmsg_hdr->cmsg_len != CMSG_LEN(sizeof(u16)))
+				return -EINVAL;
+			memcpy((void *)udp_pkt_size, CMSG_DATA(cmsg_hdr),
+			       sizeof(u16));
+		}
+	}
+
+	if (!ancillary_data_ptr)
+		return -EINVAL;
+
+	return 0;
+}
+
+static int quic_sendmsg_validate(struct msghdr *msg)
+{
+	if (!iter_is_iovec(&msg->msg_iter))
+		return -EINVAL;
+
+	if (!msg->msg_controllen)
+		return -EINVAL;
+
+	return 0;
+}
+
+static struct quic_connection_rhash
+*quic_lookup_connection(struct quic_context *ctx,
+			u8 *conn_id,
+			struct quic_tx_ancillary_data *ancillary_data,
+			sa_family_t sa_family,
+			void *addr,
+			__be16 port)
+{
+	struct quic_connection_info_key conn_key;
+	size_t addrlen;
+
+	// Lookup connection information by the connection key.
+	memset(&conn_key, 0, sizeof(struct quic_connection_info_key));
+	// fill the connection id up to the max connection ID length
+	if (ancillary_data->dst_conn_id_length > QUIC_MAX_CONNECTION_ID_SIZE)
+		return NULL;
+
+	conn_key.dst_conn_id_length = ancillary_data->dst_conn_id_length;
+	if (ancillary_data->dst_conn_id_length)
+		memcpy(conn_key.dst_conn_id,
+		       conn_id,
+		       ancillary_data->dst_conn_id_length);
+
+	addrlen = (sa_family == AF_INET) ? 4 : 16;
+	memcpy(&conn_key.addr, addr, addrlen);
+	conn_key.udp_port = port;
+
+	return rhashtable_lookup_fast(&ctx->tx_connections,
+				      &conn_key,
+				      quic_tx_connection_params);
+}
+
+static int quic_sg_capacity_from_msg(const size_t pkt_size,
+				     const off_t offset,
+				     const size_t length)
+{
+	size_t	pages = 0;
+	size_t	pkts = 0;
+
+	pages = DIV_ROUND_UP(offset + length, PAGE_SIZE);
+	pkts = DIV_ROUND_UP(length, pkt_size);
+	return pages + pkts + 1;
+}
+
+static void quic_put_plain_user_pages(struct page **pages, size_t nr_pages)
+{
+	int i;
+
+	for (i = 0; i < nr_pages; ++i)
+		if (i == 0 || pages[i] != pages[i - 1])
+			put_page(pages[i]);
+}
+
+static int quic_get_plain_user_pages(struct msghdr * const msg,
+				     struct page **pages,
+				     int *page_indices)
+{
+	void __user	*data_addr;
+	size_t	nr_mapped = 0;
+	size_t	nr_pages = 0;
+	void	*page_addr;
+	size_t	count = 0;
+	off_t	data_off;
+	int	ret = 0;
+	int	i;
+
+	for (i = 0; i < msg->msg_iter.nr_segs; ++i) {
+		data_addr = msg->msg_iter.iov[i].iov_base;
+		if (!i)
+			data_addr += msg->msg_iter.iov_offset;
+		page_addr =
+			(void *)((unsigned long)data_addr & PAGE_MASK);
+
+		data_off = (unsigned long)data_addr & ~PAGE_MASK;
+		nr_pages =
+			DIV_ROUND_UP(data_off + msg->msg_iter.iov[i].iov_len,
+				     PAGE_SIZE);
+		if (nr_mapped + nr_pages > QUIC_MAX_PLAIN_PAGES) {
+			quic_put_plain_user_pages(pages, nr_mapped);
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		count = get_user_pages((unsigned long)page_addr, nr_pages, 1,
+				       pages, NULL);
+		if (count < nr_pages) {
+			quic_put_plain_user_pages(pages, nr_mapped + count);
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		page_indices[i] = nr_mapped;
+		nr_mapped += count;
+		pages += count;
+	}
+	ret = nr_mapped;
+
+out:
+	return ret;
+}
+
+static int quic_sg_plain_from_mapped_msg(struct msghdr * const msg,
+					 struct page **plain_pages,
+					 void **iov_base_ptrs,
+					 void **iov_data_ptrs,
+					 const size_t plain_size,
+					 const size_t pkt_size,
+					 struct scatterlist * const sg_alloc,
+					 const size_t max_sg_alloc,
+					 struct scatterlist ** const sg_pkts,
+					 size_t *nr_plain_pages)
+{
+	int iov_page_indices[QUIC_MAX_IOVEC_SEGMENTS];
+	struct scatterlist *sg;
+	unsigned int pkt_i = 0;
+	ssize_t left_on_page;
+	size_t pkt_left;
+	unsigned int i;
+	size_t seg_len;
+	off_t page_ofs;
+	off_t seg_ofs;
+	int ret = 0;
+	int page_i;
+
+	if (msg->msg_iter.nr_segs >= QUIC_MAX_IOVEC_SEGMENTS) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ret = quic_get_plain_user_pages(msg, plain_pages, iov_page_indices);
+	if (ret < 0)
+		goto out;
+
+	*nr_plain_pages = ret;
+	sg = sg_alloc;
+	sg_pkts[pkt_i] = sg;
+	sg_unmark_end(sg);
+	pkt_left = pkt_size;
+	for (i = 0; i < msg->msg_iter.nr_segs; ++i) {
+		page_ofs = ((unsigned long)msg->msg_iter.iov[i].iov_base
+			   & (PAGE_SIZE - 1));
+		page_i = 0;
+		if (!i) {
+			page_ofs += msg->msg_iter.iov_offset;
+			while (page_ofs >= PAGE_SIZE) {
+				page_ofs -= PAGE_SIZE;
+				page_i++;
+			}
+		}
+
+		seg_len = msg->msg_iter.iov[i].iov_len;
+		page_i += iov_page_indices[i];
+
+		if (page_i >= QUIC_MAX_PLAIN_PAGES)
+			return -EFAULT;
+
+		seg_ofs = 0;
+		while (seg_ofs < seg_len) {
+			if (sg - sg_alloc > max_sg_alloc)
+				return -EFAULT;
+
+			sg_unmark_end(sg);
+			left_on_page = min_t(size_t, PAGE_SIZE - page_ofs,
+					     seg_len - seg_ofs);
+			if (left_on_page <= 0)
+				return -EFAULT;
+
+			if (left_on_page > pkt_left) {
+				sg_set_page(sg, plain_pages[page_i], pkt_left,
+					    page_ofs);
+				pkt_i++;
+				seg_ofs += pkt_left;
+				page_ofs += pkt_left;
+				sg_mark_end(sg);
+				sg++;
+				sg_pkts[pkt_i] = sg;
+				pkt_left = pkt_size;
+				continue;
+			}
+			sg_set_page(sg, plain_pages[page_i], left_on_page,
+				    page_ofs);
+			page_i++;
+			page_ofs = 0;
+			seg_ofs += left_on_page;
+			pkt_left -= left_on_page;
+			if (pkt_left == 0 ||
+			    (seg_ofs == seg_len &&
+			     i == msg->msg_iter.nr_segs - 1)) {
+				sg_mark_end(sg);
+				pkt_i++;
+				sg++;
+				sg_pkts[pkt_i] = sg;
+				pkt_left = pkt_size;
+			} else {
+				sg++;
+			}
+		}
+	}
+
+	if (pkt_left && pkt_left != pkt_size) {
+		pkt_i++;
+		sg_mark_end(sg);
+	}
+	ret = pkt_i;
+
+out:
+	return ret;
+}
+
+/* sg_alloc: allocated zeroed array of scatterlists
+ * cipher_page: preallocated compound page
+ */
+static int quic_sg_cipher_from_pkts(const size_t cipher_tag_size,
+				    const size_t plain_pkt_size,
+				    const size_t plain_size,
+				    struct page * const cipher_page,
+				    struct scatterlist * const sg_alloc,
+				    const size_t nr_sg_alloc,
+				    struct scatterlist ** const sg_cipher)
+{
+	const size_t cipher_pkt_size = plain_pkt_size + cipher_tag_size;
+	size_t pkts = DIV_ROUND_UP(plain_size, plain_pkt_size);
+	struct scatterlist *sg = sg_alloc;
+	int pkt_i;
+	void *ptr;
+
+	if (pkts > nr_sg_alloc)
+		return -EINVAL;
+
+	ptr = page_address(cipher_page);
+	for (pkt_i = 0; pkt_i < pkts;
+		++pkt_i, ptr += cipher_pkt_size, ++sg) {
+		sg_set_buf(sg, ptr, cipher_pkt_size);
+		sg_mark_end(sg);
+		sg_cipher[pkt_i] = sg;
+	}
+	return pkts;
+}
+
+/* fast copy from scatterlist to a buffer assuming that all pages are
+ * available in kernel memory.
+ */
+static int quic_sg_pcopy_to_buffer_kernel(struct scatterlist *sg,
+					  u8 *buffer,
+					  size_t bytes_to_copy,
+					  off_t offset_to_read)
+{
+	off_t sg_remain = sg->length;
+	size_t to_copy;
+
+	if (!bytes_to_copy)
+		return 0;
+
+	/* skip to offset first */
+	while (offset_to_read > 0) {
+		if (!sg_remain)
+			return -EINVAL;
+		if (offset_to_read < sg_remain) {
+			sg_remain -= offset_to_read;
+			break;
+		}
+		offset_to_read -= sg_remain;
+		sg = sg_next(sg);
+		if (!sg)
+			return -EINVAL;
+		sg_remain = sg->length;
+	}
+
+	/* traverse sg list from offset to offset + bytes_to_copy */
+	while (bytes_to_copy) {
+		to_copy = min_t(size_t, bytes_to_copy, sg_remain);
+		if (!to_copy)
+			return -EINVAL;
+		memcpy(buffer, sg_virt(sg) + (sg->length - sg_remain), to_copy);
+		buffer += to_copy;
+		bytes_to_copy -= to_copy;
+		if (bytes_to_copy) {
+			sg = sg_next(sg);
+			if (!sg)
+				return -EINVAL;
+			sg_remain = sg->length;
+		}
+	}
+
+	return 0;
+}
+
+static int quic_copy_header(struct scatterlist *sg_plain,
+			    u8 *buf, const size_t buf_len,
+			    const size_t conn_id_len)
+{
+	u8 *pkt = sg_virt(sg_plain);
+	size_t hdr_len;
+
+	hdr_len = 1 + conn_id_len + ((*pkt & 0x03) + 1);
+	if (hdr_len > QUIC_MAX_SHORT_HEADER_SIZE || hdr_len > buf_len)
+		return -EINVAL;
+
+	WARN_ON_ONCE(quic_sg_pcopy_to_buffer_kernel(sg_plain, buf, hdr_len, 0));
+	return hdr_len;
+}
+
+static u64 quic_unpack_pkt_num(struct quic_tx_ancillary_data * const control,
+			       const u8 * const hdr,
+			       const off_t payload_crypto_off)
+{
+	u64 truncated_pn = 0;
+	u64 candidate_pn;
+	u64 expected_pn;
+	u64 pn_hwin;
+	u64 pn_mask;
+	u64 pn_len;
+	u64 pn_win;
+	int i;
+
+	pn_len = (hdr[0] & 0x03) + 1;
+	expected_pn = control->next_pkt_num;
+
+	for (i = 1 + control->dst_conn_id_length; i < payload_crypto_off; ++i) {
+		truncated_pn <<= 8;
+		truncated_pn |= hdr[i];
+	}
+
+	pn_win = 1ULL << (pn_len << 3);
+	pn_hwin = pn_win >> 1;
+	pn_mask = pn_win - 1;
+	candidate_pn = (expected_pn & ~pn_mask) | truncated_pn;
+
+	if (expected_pn > pn_hwin &&
+	    candidate_pn <= expected_pn - pn_hwin &&
+	    candidate_pn < (1ULL << 62) - pn_win)
+		return candidate_pn + pn_win;
+
+	if (candidate_pn > expected_pn + pn_hwin &&
+	    candidate_pn >= pn_win)
+		return candidate_pn - pn_win;
+
+	return candidate_pn;
+}
+
+static int
+quic_construct_header_prot_mask(struct quic_internal_crypto_context *crypto_ctx,
+				struct skcipher_request *hdr_mask_req,
+				struct scatterlist *sg_cipher_pkt,
+				off_t sample_offset,
+				u8 *hdr_mask)
+{
+	u8 *sample = sg_virt(sg_cipher_pkt) + sample_offset;
+	u8 hdr_ctr[sizeof(u32) + QUIC_CIPHER_MAX_IV_SIZE];
+	u8 chacha20_zeros[5] = {0, 0, 0, 0, 0};
+	struct scatterlist sg_cipher_sample;
+	struct scatterlist sg_hdr_mask;
+	struct crypto_wait wait_header;
+	__le32	counter;
+
+	BUILD_BUG_ON(QUIC_HDR_MASK_SIZE
+		     < sizeof(u32) + QUIC_CIPHER_MAX_IV_SIZE);
+
+	sg_init_one(&sg_hdr_mask, hdr_mask, QUIC_HDR_MASK_SIZE);
+	skcipher_request_set_callback(hdr_mask_req, 0, crypto_req_done,
+				      &wait_header);
+
+	if (crypto_ctx->conn_info.cipher_type == TLS_CIPHER_CHACHA20_POLY1305) {
+		sg_init_one(&sg_cipher_sample, (u8 *)chacha20_zeros,
+			    sizeof(chacha20_zeros));
+		counter = cpu_to_le32(*((u32 *)sample));
+		memset(hdr_ctr, 0, sizeof(hdr_ctr));
+		memcpy((u8 *)hdr_ctr, (u8 *)&counter, sizeof(u32));
+		memcpy((u8 *)hdr_ctr + sizeof(u32),
+		       (sample + sizeof(u32)),
+		       QUIC_CIPHER_MAX_IV_SIZE);
+		skcipher_request_set_crypt(hdr_mask_req, &sg_cipher_sample,
+					   &sg_hdr_mask, 5, hdr_ctr);
+	} else {
+		/* cipher pages are continuous, get the pointer to the sg data
+		   directly, pages are allocated in kernel */
+		sg_init_one(&sg_cipher_sample, sample, QUIC_HDR_MASK_SIZE);
+		skcipher_request_set_crypt(hdr_mask_req, &sg_cipher_sample,
+					   &sg_hdr_mask, QUIC_HDR_MASK_SIZE,
+					   NULL);
+	}
+
+	return crypto_wait_req(crypto_skcipher_encrypt(hdr_mask_req),
+			       &wait_header);
+}
+
+static int quic_protect_header(struct quic_internal_crypto_context *crypto_ctx,
+			       struct quic_tx_ancillary_data *control,
+			       struct skcipher_request *hdr_mask_req,
+			       struct scatterlist *sg_cipher_pkt,
+			       int payload_crypto_off)
+{
+	u8 hdr_mask[QUIC_HDR_MASK_SIZE];
+	off_t quic_pkt_num_off;
+	u8 quic_pkt_num_len;
+	u8 *cipher_hdr;
+	int err;
+	int i;
+
+	quic_pkt_num_off = 1 + control->dst_conn_id_length;
+	quic_pkt_num_len = payload_crypto_off - quic_pkt_num_off;
+
+	if (quic_pkt_num_len > 4)
+		return -EPERM;
+
+	err = quic_construct_header_prot_mask(crypto_ctx, hdr_mask_req,
+					      sg_cipher_pkt,
+					      payload_crypto_off +
+					      (4 - quic_pkt_num_len),
+					      hdr_mask);
+	if (unlikely(err))
+		return err;
+
+	cipher_hdr = sg_virt(sg_cipher_pkt);
+	/* protect the public flags */
+	cipher_hdr[0] ^= (hdr_mask[0] & 0x1f);
+
+	for (i = 0; i < quic_pkt_num_len; ++i)
+		cipher_hdr[quic_pkt_num_off + i] ^= hdr_mask[1 + i];
+
+	return 0;
+}
+
+static
+void quic_construct_ietf_nonce(u8 *nonce,
+			       struct quic_internal_crypto_context *crypto_ctx,
+			       u64 quic_pkt_num)
+{
+	u8 *iv = quic_payload_iv(crypto_ctx);
+	int i;
+
+	for (i = quic_crypto_nonce_size(crypto_ctx->conn_info.cipher_type) - 1;
+	     i >= 0 && quic_pkt_num;
+	     --i, quic_pkt_num >>= 8)
+		nonce[i] = iv[i] ^ (u8)quic_pkt_num;
+
+	for (; i >= 0; --i)
+		nonce[i] = iv[i];
+}
+
+static ssize_t quic_sendpage(struct quic_context *ctx,
+			     struct sock *sk,
+			     struct msghdr *msg,
+			     const size_t cipher_size,
+			     struct page * const cipher_page)
+{
+	struct kvec iov;
+	ssize_t ret;
+
+	iov.iov_base = page_address(cipher_page);
+	iov.iov_len = cipher_size;
+	iov_iter_kvec(&msg->msg_iter, WRITE, &iov, 1, cipher_size);
+	ret = security_socket_sendmsg(sk->sk_socket, msg, msg_data_left(msg));
+	if (ret)
+		return ret;
+
+	ret = ctx->sk_proto->sendmsg(sk, msg, msg_data_left(msg));
+	WARN_ON(ret == -EIOCBQUEUED);
+	return ret;
+}
+
+static int quic_extract_dst_address_info(struct sock *sk, struct msghdr *msg,
+					 sa_family_t *sa_family, void **daddr,
+					  __be16 *dport)
+{
+	DECLARE_SOCKADDR(struct sockaddr_in6 *, usin6, msg->msg_name);
+	DECLARE_SOCKADDR(struct sockaddr_in *, usin, msg->msg_name);
+	struct inet_sock *inet = inet_sk(sk);
+	struct ipv6_pinfo *np = inet6_sk(sk);
+
+	if (usin6) {
+		/* dst address is provided in msg */
+		*sa_family = usin6->sin6_family;
+		switch (*sa_family) {
+		case AF_INET:
+			if (msg->msg_namelen < sizeof(*usin))
+				return -EINVAL;
+			*daddr = &usin->sin_addr.s_addr;
+			*dport = usin->sin_port;
+			break;
+		case AF_INET6:
+			if (msg->msg_namelen < sizeof(*usin6))
+				return -EINVAL;
+			*daddr = &usin6->sin6_addr;
+			*dport = usin6->sin6_port;
+			break;
+		default:
+			return -EAFNOSUPPORT;
+		}
+	} else {
+		/* socket should be connected */
+		if (sk->sk_state != TCP_ESTABLISHED)
+			return -EDESTADDRREQ;
+		if (np) {
+			*sa_family = AF_INET6;
+			*daddr = &sk->sk_v6_daddr;
+			*dport = inet->inet_dport;
+		} else if (inet) {
+			*sa_family = AF_INET;
+			*daddr = &sk->sk_daddr;
+			*dport = inet->inet_dport;
+		} else {
+			return -EAFNOSUPPORT;
+		}
+	}
+
+	if (!*dport || !*daddr)
+		return -EINVAL;
+
+	if (*sa_family == AF_INET6 &&
+	    ipv6_addr_v4mapped((struct in6_addr *)(*daddr))) {
+		*daddr = &((struct in6_addr *)(*daddr))->s6_addr32[3];
+		*sa_family = AF_INET;
+	}
+
+	return 0;
+}
+
+static int quic_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
+{
+	struct quic_internal_crypto_context *crypto_ctx = NULL;
+	struct scatterlist *sg_cipher_pkts[QUIC_MAX_GSO_FRAGS];
+	struct scatterlist *sg_plain_pkts[QUIC_MAX_GSO_FRAGS];
+	struct page *plain_pages[QUIC_MAX_PLAIN_PAGES];
+	void *plain_base_ptrs[QUIC_MAX_IOVEC_SEGMENTS];
+	void *plain_data_ptrs[QUIC_MAX_IOVEC_SEGMENTS];
+	struct msghdr msg_cipher = {
+		.msg_name = msg->msg_name,
+		.msg_namelen = msg->msg_namelen,
+		.msg_flags = msg->msg_flags,
+		.msg_control = msg->msg_control,
+		.msg_controllen = msg->msg_controllen,
+	};
+	struct quic_connection_rhash *connhash = NULL;
+	struct quic_context *ctx = quic_get_ctx(sk);
+	u8 hdr_buf[QUIC_MAX_SHORT_HEADER_SIZE];
+	struct skcipher_request *hdr_mask_req;
+	struct quic_tx_ancillary_data control;
+	struct	aead_request *aead_req = NULL;
+	u8 nonce[QUIC_CIPHER_MAX_NONCE_SIZE];
+	struct scatterlist *sg_cipher = NULL;
+	struct udp_sock *up = udp_sk(sk);
+	struct scatterlist *sg_plain = NULL;
+	u16 gso_pkt_size = up->gso_size;
+	size_t last_plain_pkt_size = 0;
+	off_t	payload_crypto_offset;
+	struct crypto_aead *tfm = NULL;
+	size_t nr_plain_pages = 0;
+	struct crypto_wait waiter;
+	size_t nr_sg_cipher_pkts;
+	size_t nr_sg_plain_pkts;
+	u8 conn_payload_key_gen;
+	ssize_t hdr_buf_len = 0;
+	size_t nr_sg_alloc = 0;
+	size_t plain_pkt_size;
+	sa_family_t sa_family;
+	u64	full_pkt_num;
+	size_t cipher_size;
+	size_t plain_size;
+	ssize_t pkt_size;
+	size_t tag_size;
+	__be16 dport;
+	int ret = 0;
+	void *daddr;
+	int pkt_i;
+	int err;
+
+	memset(&hdr_buf[0], 0, QUIC_MAX_SHORT_HEADER_SIZE);
+	hdr_buf_len = copy_from_iter(hdr_buf, QUIC_MAX_SHORT_HEADER_SIZE,
+				     &msg->msg_iter);
+	if (hdr_buf_len <= 0) {
+		ret = -EINVAL;
+		goto out;
+	}
+	iov_iter_revert(&msg->msg_iter, hdr_buf_len);
+
+	ctx = quic_get_ctx(sk);
+
+	// Bypass for anything that is guaranteed not QUIC.
+	plain_size = len;
+
+	if (plain_size < 2)
+		return ctx->sk_proto->sendmsg(sk, msg, len);
+
+	// Bypass for other than short header.
+	if ((hdr_buf[0] & 0xc0) != 0x40)
+		return ctx->sk_proto->sendmsg(sk, msg, len);
+
+	// Crypto adds a tag after the packet. Corking a payload would produce
+	// a crypto tag after each portion. Use GSO instead.
+	if ((msg->msg_flags & MSG_MORE) || up->pending) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = quic_sendmsg_validate(msg);
+	if (ret)
+		goto out;
+
+	ret = quic_extract_ancillary_data(msg, &control, &gso_pkt_size);
+	if (ret)
+		goto out;
+
+	// Reserved bits with ancillary data present are an error.
+	if (control.flags & ~QUIC_ANCILLARY_FLAGS) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	// Bypass offload on request. First packet bypass applies to all
+	// packets in the GSO pack.
+	if (control.flags & QUIC_BYPASS_ENCRYPTION)
+		return ctx->sk_proto->sendmsg(sk, msg, len);
+
+	if (hdr_buf_len < 1 + control.dst_conn_id_length) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	conn_payload_key_gen = (hdr_buf[0] & 0x04) >> 2;
+
+	ret = quic_extract_dst_address_info(sk, msg, &sa_family, &daddr,
+					    &dport);
+	if (ret)
+		goto out;
+
+	// Fetch the flow
+	connhash = quic_lookup_connection(ctx, &hdr_buf[1], &control,
+					  sa_family, daddr, dport);
+	if (!connhash) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	crypto_ctx = &connhash->crypto_ctx;
+	tag_size = quic_crypto_tag_size(crypto_ctx->conn_info.cipher_type);
+
+	if (crypto_ctx->conn_info.conn_payload_key_gen !=
+	    conn_payload_key_gen) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	// For GSO, use the GSO size minus cipher tag size as the packet size;
+	// for non-GSO, use the size of the whole plaintext.
+	// Reduce the packet size by tag size to keep the original packet size
+	// for the rest of the UDP path in the stack.
+	if (!gso_pkt_size) {
+		plain_pkt_size = plain_size;
+	} else {
+		if (gso_pkt_size < tag_size)
+			goto out;
+
+		plain_pkt_size = gso_pkt_size - tag_size;
+	}
+
+	// Build scatterlist from the input data, split by GSO minus the
+	// crypto tag size.
+	nr_sg_alloc = quic_sg_capacity_from_msg(plain_pkt_size,
+						msg->msg_iter.iov_offset,
+						plain_size);
+	if ((nr_sg_alloc * 2) >= QUIC_MAX_SG_ALLOC_ELEMENTS) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	sg_plain = ctx->sg_alloc;
+	sg_cipher = sg_plain + nr_sg_alloc;
+
+	ret = quic_sg_plain_from_mapped_msg(msg, plain_pages,
+					    plain_base_ptrs,
+					    plain_data_ptrs, plain_size,
+					    plain_pkt_size, sg_plain,
+					    nr_sg_alloc, sg_plain_pkts,
+					    &nr_plain_pages);
+
+	if (ret < 0)
+		goto out;
+
+	nr_sg_plain_pkts = ret;
+	last_plain_pkt_size = plain_size % plain_pkt_size;
+	if (!last_plain_pkt_size)
+		last_plain_pkt_size = plain_pkt_size;
+
+	// Build scatterlist for the ciphertext, split by GSO.
+	cipher_size = plain_size + nr_sg_plain_pkts * tag_size;
+
+	if (DIV_ROUND_UP(cipher_size, PAGE_SIZE)
+	    >= (1 << QUIC_MAX_CIPHER_PAGES_ORDER)) {
+		ret = -ENOMEM;
+		goto out_put_pages;
+	}
+
+	ret = quic_sg_cipher_from_pkts(tag_size, plain_pkt_size, plain_size,
+				       ctx->cipher_page, sg_cipher, nr_sg_alloc,
+				       sg_cipher_pkts);
+	if (ret < 0)
+		goto out_put_pages;
+
+	nr_sg_cipher_pkts = ret;
+
+	if (nr_sg_plain_pkts != nr_sg_cipher_pkts) {
+		ret = -EPERM;
+		goto out_put_pages;
+	}
+
+	// Encrypt and protect header for each packet individually.
+	tfm = crypto_ctx->packet_aead;
+	crypto_aead_clear_flags(tfm, ~0);
+	aead_req = aead_request_alloc(tfm, GFP_KERNEL);
+	if (!aead_req) {
+		aead_request_free(aead_req);
+		ret = -ENOMEM;
+		goto out_put_pages;
+	}
+
+	hdr_mask_req = skcipher_request_alloc(crypto_ctx->header_tfm,
+					      GFP_KERNEL);
+	if (!hdr_mask_req) {
+		aead_request_free(aead_req);
+		ret = -ENOMEM;
+		goto out_put_pages;
+	}
+
+	for (pkt_i = 0; pkt_i < nr_sg_plain_pkts; ++pkt_i) {
+		payload_crypto_offset =
+			quic_copy_header(sg_plain_pkts[pkt_i],
+					 hdr_buf,
+					 sizeof(hdr_buf),
+					 control.dst_conn_id_length);
+
+		full_pkt_num = quic_unpack_pkt_num(&control, hdr_buf,
+						   payload_crypto_offset);
+
+		pkt_size = (pkt_i + 1 < nr_sg_plain_pkts
+				? plain_pkt_size
+				: last_plain_pkt_size)
+			    - payload_crypto_offset;
+		if (pkt_size < 0) {
+			aead_request_free(aead_req);
+			skcipher_request_free(hdr_mask_req);
+			ret = -EINVAL;
+			goto out_put_pages;
+		}
+
+		/* Construct nonce and initialize request */
+		quic_construct_ietf_nonce(nonce, crypto_ctx, full_pkt_num);
+
+		/* Encrypt the body */
+		aead_request_set_callback(aead_req,
+					  CRYPTO_TFM_REQ_MAY_BACKLOG
+					  | CRYPTO_TFM_REQ_MAY_SLEEP,
+					  crypto_req_done, &waiter);
+		aead_request_set_crypt(aead_req, sg_plain_pkts[pkt_i],
+				       sg_cipher_pkts[pkt_i],
+				       pkt_size,
+				       nonce);
+		aead_request_set_ad(aead_req, payload_crypto_offset);
+		err = crypto_wait_req(crypto_aead_encrypt(aead_req), &waiter);
+		if (unlikely(err)) {
+			ret = err;
+			aead_request_free(aead_req);
+			skcipher_request_free(hdr_mask_req);
+			goto out_put_pages;
+		}
+
+		/* Protect the header */
+		memcpy(sg_virt(sg_cipher_pkts[pkt_i]), hdr_buf,
+		       payload_crypto_offset);
+
+		err = quic_protect_header(crypto_ctx, &control,
+					  hdr_mask_req,
+					  sg_cipher_pkts[pkt_i],
+					  payload_crypto_offset);
+		if (unlikely(err)) {
+			ret = err;
+			aead_request_free(aead_req);
+			skcipher_request_free(hdr_mask_req);
+			goto out_put_pages;
+		}
+	}
+	skcipher_request_free(hdr_mask_req);
+	aead_request_free(aead_req);
+
+	// Deliver to the next layer.
+	if (ctx->sk_proto->sendpage) {
+		msg_cipher.msg_flags |= MSG_MORE;
+		err = ctx->sk_proto->sendmsg(sk, &msg_cipher, 0);
+		if (err < 0) {
+			ret = err;
+			goto out_put_pages;
+		}
+
+		err = ctx->sk_proto->sendpage(sk, ctx->cipher_page, 0,
+					      cipher_size, 0);
+		if (err < 0) {
+			ret = err;
+			goto out_put_pages;
+		}
+		if (err != cipher_size) {
+			ret = -EINVAL;
+			goto out_put_pages;
+		}
+		ret = plain_size;
+	} else {
+		ret = quic_sendpage(ctx, sk, &msg_cipher, cipher_size,
+				    ctx->cipher_page);
+		// indicate full plaintext transmission to the caller.
+		if (ret > 0)
+			ret = plain_size;
+	}
+
+out_put_pages:
+	quic_put_plain_user_pages(plain_pages, nr_plain_pages);
+
+out:
+	return ret;
+}
+
+static int quic_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t len)
+{
+	struct quic_context *ctx;
+	int ret;
+
+	rcu_read_lock();
+	ctx = quic_get_ctx(sk);
+	rcu_read_unlock();
+	if (!ctx)
+		return -EINVAL;
+
+	mutex_lock(&ctx->sendmsg_mux);
+	ret = quic_sendmsg(sk, msg, len);
+	mutex_unlock(&ctx->sendmsg_mux);
+	return ret;
+}
+
+static void quic_release_resources(struct sock *sk)
+{
+	struct quic_internal_crypto_context *crypto_ctx;
+	struct quic_connection_rhash *connhash;
+	struct inet_sock *inet = inet_sk(sk);
+	struct rhashtable_iter hti;
+	struct quic_context *ctx;
+	struct proto *sk_proto;
+
+	rcu_read_lock();
+	ctx = quic_get_ctx(sk);
+	if (!ctx) {
+		rcu_read_unlock();
+		return;
+	}
+
+	sk_proto = ctx->sk_proto;
+
+	rhashtable_walk_enter(&ctx->tx_connections, &hti);
+	rhashtable_walk_start(&hti);
+
+	while ((connhash = rhashtable_walk_next(&hti))) {
+		if (IS_ERR(connhash)) {
+			if (PTR_ERR(connhash) == -EAGAIN)
+				continue;
+			break;
+		}
+
+		crypto_ctx = &connhash->crypto_ctx;
+		crypto_free_aead(crypto_ctx->packet_aead);
+		crypto_free_skcipher(crypto_ctx->header_tfm);
+		memzero_explicit(crypto_ctx, sizeof(*crypto_ctx));
+	}
+
+	rhashtable_walk_stop(&hti);
+	rhashtable_walk_exit(&hti);
+	rhashtable_destroy(&ctx->tx_connections);
+
+	if (ctx->cipher_page) {
+		quic_free_cipher_page(ctx->cipher_page);
+		ctx->cipher_page = NULL;
+	}
+
+	rcu_read_unlock();
+
+	write_lock_bh(&sk->sk_callback_lock);
+	rcu_assign_pointer(inet->ulp_data, NULL);
+	WRITE_ONCE(sk->sk_prot, sk_proto);
+	write_unlock_bh(&sk->sk_callback_lock);
+
+	kfree_rcu(ctx, rcu);
+}
+
+static void
+quic_prep_protos(unsigned int af, struct proto *proto, const struct proto *base)
+{
+	if (likely(test_bit(af, &af_init_done)))
+		return;
+
+	spin_lock(&quic_proto_lock);
+	if (test_bit(af, &af_init_done))
+		goto out_unlock;
+
+	*proto			= *base;
+	proto->setsockopt	= quic_setsockopt;
+	proto->getsockopt	= quic_getsockopt;
+	proto->sendmsg		= quic_sendmsg_locked;
+
+	smp_mb__before_atomic(); /* proto calls should be visible first */
+	set_bit(af, &af_init_done);
+
+out_unlock:
+	spin_unlock(&quic_proto_lock);
+}
+
+static void quic_update_proto(struct sock *sk, struct quic_context *ctx)
+{
+	struct proto *udp_proto, *quic_proto;
+	struct inet_sock *inet = inet_sk(sk);
+
+	udp_proto = READ_ONCE(sk->sk_prot);
+	ctx->sk_proto = udp_proto;
+	quic_proto = sk->sk_family == AF_INET ? &quic_v4_proto : &quic_v6_proto;
+
+	quic_prep_protos(sk->sk_family, quic_proto, udp_proto);
+
+	write_lock_bh(&sk->sk_callback_lock);
+	rcu_assign_pointer(inet->ulp_data, ctx);
+	WRITE_ONCE(sk->sk_prot, quic_proto);
+	write_unlock_bh(&sk->sk_callback_lock);
+}
+
+static int quic_init(struct sock *sk)
+{
+	struct quic_context *ctx;
+
+	ctx = quic_ctx_create();
+	if (!ctx)
+		return -ENOMEM;
+
+	quic_update_proto(sk, ctx);
+
+	return 0;
+}
+
+static void quic_release(struct sock *sk)
+{
+	lock_sock(sk);
+	quic_release_resources(sk);
+	release_sock(sk);
+}
+
+static struct udp_ulp_ops quic_ulp_ops __read_mostly = {
+	.name		= "quic-crypto",
+	.owner		= THIS_MODULE,
+	.init		= quic_init,
+	.release	= quic_release,
+};
+
+static int __init quic_register(void)
+{
+	udp_register_ulp(&quic_ulp_ops);
+	return 0;
+}
+
+static void __exit quic_unregister(void)
+{
+	udp_unregister_ulp(&quic_ulp_ops);
+}
+
+module_init(quic_register);
+module_exit(quic_unregister);
+
+MODULE_DESCRIPTION("QUIC crypto ULP");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_UDP_ULP("quic-crypto");
diff --git a/security/security.c b/security/security.c
index 4b95de24bc8d..9774cbeec330 100644
--- a/security/security.c
+++ b/security/security.c
@@ -2222,6 +2222,7 @@ int security_socket_sendmsg(struct socket *sock, struct msghdr *msg, int size)
 {
 	return call_int_hook(socket_sendmsg, 0, sock, msg, size);
 }
+EXPORT_SYMBOL(security_socket_sendmsg);
 
 int security_socket_recvmsg(struct socket *sock, struct msghdr *msg,
 			    int size, int flags)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [net-next v4 5/6] net: Add flow counters and Tx processing error counter
  2022-09-09  0:12 ` [net-next v4 0/6] net: support QUIC crypto Adel Abouchaev
                     ` (3 preceding siblings ...)
  2022-09-09  0:12   ` [net-next v4 4/6] net: Implement QUIC offload functions Adel Abouchaev
@ 2022-09-09  0:12   ` Adel Abouchaev
  2022-09-09  0:12   ` [net-next v4 6/6] net: Add self tests for ULP operations, flow setup and crypto tests Adel Abouchaev
  5 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-09-09  0:12 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, corbet, dsahern, shuah, paul,
	jmorris, serge, linux-security-module, netdev, linux-doc,
	linux-kselftest

Added flow counters. Total flow counter is accumulative, the current shows
the number of flows currently in flight, the error counters is accumulating
the number of errors during Tx processing.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>

---

Updated enum bracket to follow enum keyword. Removed extra blank lines.
---
 include/net/netns/mib.h   |  3 +++
 include/net/quic.h        | 10 +++++++++
 include/net/snmp.h        |  6 +++++
 include/uapi/linux/snmp.h |  9 ++++++++
 net/quic/Makefile         |  2 +-
 net/quic/quic_main.c      | 46 +++++++++++++++++++++++++++++++++++++++
 net/quic/quic_proc.c      | 45 ++++++++++++++++++++++++++++++++++++++
 7 files changed, 120 insertions(+), 1 deletion(-)
 create mode 100644 net/quic/quic_proc.c

diff --git a/include/net/netns/mib.h b/include/net/netns/mib.h
index 7e373664b1e7..dcbba3d1ceec 100644
--- a/include/net/netns/mib.h
+++ b/include/net/netns/mib.h
@@ -24,6 +24,9 @@ struct netns_mib {
 #if IS_ENABLED(CONFIG_TLS)
 	DEFINE_SNMP_STAT(struct linux_tls_mib, tls_statistics);
 #endif
+#if IS_ENABLED(CONFIG_QUIC)
+	DEFINE_SNMP_STAT(struct linux_quic_mib, quic_statistics);
+#endif
 #ifdef CONFIG_MPTCP
 	DEFINE_SNMP_STAT(struct mptcp_mib, mptcp_statistics);
 #endif
diff --git a/include/net/quic.h b/include/net/quic.h
index cafe01174e60..6362d827d266 100644
--- a/include/net/quic.h
+++ b/include/net/quic.h
@@ -25,6 +25,16 @@
 #define QUIC_MAX_PLAIN_PAGES		16
 #define QUIC_MAX_CIPHER_PAGES_ORDER	4
 
+#define __QUIC_INC_STATS(net, field)				\
+	__SNMP_INC_STATS((net)->mib.quic_statistics, field)
+#define QUIC_INC_STATS(net, field)				\
+	SNMP_INC_STATS((net)->mib.quic_statistics, field)
+#define QUIC_DEC_STATS(net, field)				\
+	SNMP_DEC_STATS((net)->mib.quic_statistics, field)
+
+int __net_init quic_proc_init(struct net *net);
+void __net_exit quic_proc_fini(struct net *net);
+
 struct quic_internal_crypto_context {
 	struct quic_connection_info	conn_info;
 	struct crypto_skcipher		*header_tfm;
diff --git a/include/net/snmp.h b/include/net/snmp.h
index 468a67836e2f..f94680a3e9e8 100644
--- a/include/net/snmp.h
+++ b/include/net/snmp.h
@@ -117,6 +117,12 @@ struct linux_tls_mib {
 	unsigned long	mibs[LINUX_MIB_TLSMAX];
 };
 
+/* Linux QUIC */
+#define LINUX_MIB_QUICMAX	__LINUX_MIB_QUICMAX
+struct linux_quic_mib {
+	unsigned long	mibs[LINUX_MIB_QUICMAX];
+};
+
 #define DEFINE_SNMP_STAT(type, name)	\
 	__typeof__(type) __percpu *name
 #define DEFINE_SNMP_STAT_ATOMIC(type, name)	\
diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index 4d7470036a8b..ca1e626dbdb4 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -349,4 +349,13 @@ enum
 	__LINUX_MIB_TLSMAX
 };
 
+/* linux QUIC mib definitions */
+enum {
+	LINUX_MIB_QUICNUM = 0,
+	LINUX_MIB_QUICCURRTXSW,			/* QuicCurrTxSw */
+	LINUX_MIB_QUICTXSW,			/* QuicTxSw */
+	LINUX_MIB_QUICTXSWERROR,		/* QuicTxSwError */
+	__LINUX_MIB_QUICMAX
+};
+
 #endif	/* _LINUX_SNMP_H */
diff --git a/net/quic/Makefile b/net/quic/Makefile
index 928239c4d08c..a885cd8bc4e0 100644
--- a/net/quic/Makefile
+++ b/net/quic/Makefile
@@ -5,4 +5,4 @@
 
 obj-$(CONFIG_QUIC) += quic.o
 
-quic-y := quic_main.o
+quic-y := quic_main.o quic_proc.o
diff --git a/net/quic/quic_main.c b/net/quic/quic_main.c
index 32535f7b7f11..bf041db280f8 100644
--- a/net/quic/quic_main.c
+++ b/net/quic/quic_main.c
@@ -359,6 +359,8 @@ static int do_quic_conn_add_tx(struct sock *sk, sockptr_t optval,
 	if (rc < 0)
 		goto err_free_ciphers;
 
+	QUIC_INC_STATS(sock_net(sk), LINUX_MIB_QUICCURRTXSW);
+	QUIC_INC_STATS(sock_net(sk), LINUX_MIB_QUICTXSW);
 	return 0;
 
 err_free_ciphers:
@@ -416,6 +418,7 @@ static int do_quic_conn_del_tx(struct sock *sk, sockptr_t optval,
 	crypto_free_aead(crypto_ctx->packet_aead);
 	memzero_explicit(crypto_ctx, sizeof(*crypto_ctx));
 	kfree(connhash);
+	QUIC_DEC_STATS(sock_net(sk), LINUX_MIB_QUICCURRTXSW);
 
 	return 0;
 }
@@ -441,6 +444,9 @@ static int do_quic_setsockopt(struct sock *sk, int optname, sockptr_t optval,
 		break;
 	}
 
+	if (rc)
+		QUIC_INC_STATS(sock_net(sk), LINUX_MIB_QUICTXSWERROR);
+
 	return rc;
 }
 
@@ -1329,6 +1335,9 @@ static int quic_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	quic_put_plain_user_pages(plain_pages, nr_plain_pages);
 
 out:
+	if (unlikely(ret < 0))
+		QUIC_INC_STATS(sock_net(sk), LINUX_MIB_QUICTXSWERROR);
+
 	return ret;
 }
 
@@ -1461,6 +1470,36 @@ static void quic_release(struct sock *sk)
 	release_sock(sk);
 }
 
+static int __net_init quic_init_net(struct net *net)
+{
+	int err;
+
+	net->mib.quic_statistics = alloc_percpu(struct linux_quic_mib);
+	if (!net->mib.quic_statistics)
+		return -ENOMEM;
+
+	err = quic_proc_init(net);
+	if (err)
+		goto err_free_stats;
+
+	return 0;
+
+err_free_stats:
+	free_percpu(net->mib.quic_statistics);
+	return err;
+}
+
+static void __net_exit quic_exit_net(struct net *net)
+{
+	quic_proc_fini(net);
+	free_percpu(net->mib.quic_statistics);
+}
+
+static struct pernet_operations quic_proc_ops = {
+	.init = quic_init_net,
+	.exit = quic_exit_net,
+};
+
 static struct udp_ulp_ops quic_ulp_ops __read_mostly = {
 	.name		= "quic-crypto",
 	.owner		= THIS_MODULE,
@@ -1470,6 +1509,12 @@ static struct udp_ulp_ops quic_ulp_ops __read_mostly = {
 
 static int __init quic_register(void)
 {
+	int err;
+
+	err = register_pernet_subsys(&quic_proc_ops);
+	if (err)
+		return err;
+
 	udp_register_ulp(&quic_ulp_ops);
 	return 0;
 }
@@ -1477,6 +1522,7 @@ static int __init quic_register(void)
 static void __exit quic_unregister(void)
 {
 	udp_unregister_ulp(&quic_ulp_ops);
+	unregister_pernet_subsys(&quic_proc_ops);
 }
 
 module_init(quic_register);
diff --git a/net/quic/quic_proc.c b/net/quic/quic_proc.c
new file mode 100644
index 000000000000..cb4fe7a589b5
--- /dev/null
+++ b/net/quic/quic_proc.c
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+/* Copyright (C) 2019 Meta Platforms, Inc. */
+
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <net/snmp.h>
+#include <net/quic.h>
+
+#ifdef CONFIG_PROC_FS
+static const struct snmp_mib quic_mib_list[] = {
+	SNMP_MIB_ITEM("QuicCurrTxSw", LINUX_MIB_QUICCURRTXSW),
+	SNMP_MIB_ITEM("QuicTxSw", LINUX_MIB_QUICTXSW),
+	SNMP_MIB_ITEM("QuicTxSwError", LINUX_MIB_QUICTXSWERROR),
+	SNMP_MIB_SENTINEL
+};
+
+static int quic_statistics_seq_show(struct seq_file *seq, void *v)
+{
+	unsigned long buf[LINUX_MIB_QUICMAX] = {};
+	struct net *net = seq->private;
+	int i;
+
+	snmp_get_cpu_field_batch(buf, quic_mib_list, net->mib.quic_statistics);
+	for (i = 0; quic_mib_list[i].name; i++)
+		seq_printf(seq, "%-32s\t%lu\n", quic_mib_list[i].name, buf[i]);
+
+	return 0;
+}
+#endif
+
+int __net_init quic_proc_init(struct net *net)
+{
+#ifdef CONFIG_PROC_FS
+	if (!proc_create_net_single("quic_stat", 0444, net->proc_net,
+				    quic_statistics_seq_show, NULL))
+		return -ENOMEM;
+#endif /* CONFIG_PROC_FS */
+
+	return 0;
+}
+
+void __net_exit quic_proc_fini(struct net *net)
+{
+	remove_proc_entry("quic_stat", net->proc_net);
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [net-next v4 6/6] net: Add self tests for ULP operations, flow setup and crypto tests
  2022-09-09  0:12 ` [net-next v4 0/6] net: support QUIC crypto Adel Abouchaev
                     ` (4 preceding siblings ...)
  2022-09-09  0:12   ` [net-next v4 5/6] net: Add flow counters and Tx processing error counter Adel Abouchaev
@ 2022-09-09  0:12   ` Adel Abouchaev
  5 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-09-09  0:12 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, corbet, dsahern, shuah, paul,
	jmorris, serge, linux-security-module, netdev, linux-doc,
	linux-kselftest

Add self tests for ULP operations, flow setup and crypto tests.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>

---

Restored the test build. Changed the QUIC context reference variable
names for the keys and iv to match the uAPI.

Updated alignment, added SPDX license line.

v3: Added Chacha20-Poly1305 test.
v3: Added test to fail sending with wrong key generation bit.
---
 tools/testing/selftests/net/.gitignore |    1 +
 tools/testing/selftests/net/Makefile   |    3 +-
 tools/testing/selftests/net/quic.c     | 1369 ++++++++++++++++++++++++
 tools/testing/selftests/net/quic.sh    |   46 +
 4 files changed, 1418 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/net/quic.c
 create mode 100755 tools/testing/selftests/net/quic.sh

diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore
index 3d7adee7a3e6..78970a09d73c 100644
--- a/tools/testing/selftests/net/.gitignore
+++ b/tools/testing/selftests/net/.gitignore
@@ -14,6 +14,7 @@ nettest
 psock_fanout
 psock_snd
 psock_tpacket
+quic
 reuseaddr_conflict
 reuseaddr_ports_exhausted
 reuseport_addr_any
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index f5ac1433c301..b4e9586a2d03 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -44,6 +44,7 @@ TEST_PROGS += arp_ndisc_untracked_subnets.sh
 TEST_PROGS += stress_reuseport_listen.sh
 TEST_PROGS += l2_tos_ttl_inherit.sh
 TEST_PROGS += bind_bhash.sh
+TEST_PROGS += quic.sh
 TEST_PROGS_EXTENDED := in_netns.sh setup_loopback.sh setup_veth.sh
 TEST_PROGS_EXTENDED += toeplitz_client.sh toeplitz.sh
 TEST_GEN_FILES =  socket nettest
@@ -59,7 +60,7 @@ TEST_GEN_FILES += ipsec
 TEST_GEN_FILES += ioam6_parser
 TEST_GEN_FILES += gro
 TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
-TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls tun tap
+TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls tun tap quic
 TEST_GEN_FILES += toeplitz
 TEST_GEN_FILES += cmsg_sender
 TEST_GEN_FILES += stress_reuseport_listen
diff --git a/tools/testing/selftests/net/quic.c b/tools/testing/selftests/net/quic.c
new file mode 100644
index 000000000000..81285a6d9601
--- /dev/null
+++ b/tools/testing/selftests/net/quic.c
@@ -0,0 +1,1369 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include <arpa/inet.h>
+#include <errno.h>
+#include <error.h>
+#include <fcntl.h>
+#include <poll.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <linux/limits.h>
+#include <linux/quic.h>
+#include <linux/socket.h>
+#include <linux/tls.h>
+#include <linux/tcp.h>
+#include <linux/types.h>
+#include <linux/udp.h>
+
+#include <sys/ioctl.h>
+#include <sys/types.h>
+#include <sys/sendfile.h>
+#include <sys/socket.h>
+#include <sys/stat.h>
+
+#include "../kselftest_harness.h"
+
+#define UDP_ULP		105
+
+#ifndef SOL_UDP
+#define SOL_UDP		17
+#endif
+
+// 1. QUIC ULP Registration Test
+
+FIXTURE(quic_ulp)
+{
+	int sfd;
+	socklen_t len_s;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} server;
+	int default_net_ns_fd;
+	int server_net_ns_fd;
+};
+
+FIXTURE_VARIANT(quic_ulp)
+{
+	unsigned int af_server;
+	char *server_address;
+	unsigned short server_port;
+};
+
+FIXTURE_VARIANT_ADD(quic_ulp, ipv4)
+{
+	.af_server = AF_INET,
+	.server_address = "10.0.0.2",
+	.server_port = 7101,
+};
+
+FIXTURE_VARIANT_ADD(quic_ulp, ipv6)
+{
+	.af_server = AF_INET6,
+	.server_address = "2001::2",
+	.server_port = 7102,
+};
+
+FIXTURE_SETUP(quic_ulp)
+{
+	char path[PATH_MAX];
+	int optval = 1;
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/net", getpid());
+	self->default_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->default_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns2");
+	self->server_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->server_net_ns_fd, 0);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	self->sfd = socket(variant->af_server, SOCK_DGRAM, 0);
+	ASSERT_NE(setsockopt(self->sfd, SOL_SOCKET, SO_REUSEPORT, &optval,
+			     sizeof(optval)), -1);
+	if (variant->af_server == AF_INET) {
+		self->len_s = sizeof(self->server.addr);
+		self->server.addr.sin_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr.sin_addr);
+		self->server.addr.sin_port = htons(variant->server_port);
+		ASSERT_EQ(bind(self->sfd, &self->server.addr, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr,
+				      &self->len_s), 0);
+	} else {
+		self->len_s = sizeof(self->server.addr6);
+		self->server.addr6.sin6_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr6.sin6_addr);
+		self->server.addr6.sin6_port = htons(variant->server_port);
+		ASSERT_EQ(bind(self->sfd, &self->server.addr6, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr6,
+				      &self->len_s), 0);
+	}
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+};
+
+FIXTURE_TEARDOWN(quic_ulp)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	close(self->sfd);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+};
+
+TEST_F(quic_ulp, request_nonexistent_udp_ulp)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_ULP,
+			     "nonexistent", sizeof("nonexistent")), -1);
+	// If UDP_ULP option is not present, the error would be ENOPROTOOPT.
+	ASSERT_EQ(errno, ENOENT);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+};
+
+TEST_F(quic_ulp, request_quic_crypto_udp_ulp)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_ULP,
+			     "quic-crypto", sizeof("quic-crypto")), 0);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+};
+
+// 2. QUIC Data Path Operation Tests
+
+#define DO_NOT_SETUP_FLOW 0
+#define SETUP_FLOW 1
+
+#define DO_NOT_USE_CLIENT 0
+#define USE_CLIENT 1
+
+FIXTURE(quic_data)
+{
+	int sfd, c1fd, c2fd;
+	socklen_t len_c1;
+	socklen_t len_c2;
+	socklen_t len_s;
+
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} client_1;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} client_2;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} server;
+	int default_net_ns_fd;
+	int client_1_net_ns_fd;
+	int client_2_net_ns_fd;
+	int server_net_ns_fd;
+};
+
+FIXTURE_VARIANT(quic_data)
+{
+	unsigned int af_client_1;
+	char *client_1_address;
+	unsigned short client_1_port;
+	uint8_t conn_id_1[8];
+	uint8_t conn_1_key[16];
+	uint8_t conn_1_iv[12];
+	uint8_t conn_1_hdr_key[16];
+	size_t conn_id_1_len;
+	bool setup_flow_1;
+	bool use_client_1;
+	unsigned int af_client_2;
+	char *client_2_address;
+	unsigned short client_2_port;
+	uint8_t conn_id_2[8];
+	uint8_t conn_2_key[16];
+	uint8_t conn_2_iv[12];
+	uint8_t conn_2_hdr_key[16];
+	size_t conn_id_2_len;
+	bool setup_flow_2;
+	bool use_client_2;
+	unsigned int af_server;
+	char *server_address;
+	unsigned short server_port;
+};
+
+FIXTURE_VARIANT_ADD(quic_data, ipv4)
+{
+	.af_client_1 = AF_INET,
+	.client_1_address = "10.0.0.1",
+	.client_1_port = 6667,
+	.conn_id_1 = {0x11, 0x12, 0x13, 0x14},
+	.conn_id_1_len = 4,
+	.setup_flow_1 = SETUP_FLOW,
+	.use_client_1 = USE_CLIENT,
+	.af_client_2 = AF_INET,
+	.client_2_address = "10.0.0.3",
+	.client_2_port = 6668,
+	.conn_id_2 = {0x21, 0x22, 0x23, 0x24},
+	.conn_id_2_len = 4,
+	.setup_flow_2 = SETUP_FLOW,
+	//.use_client_2 = USE_CLIENT,
+	.af_server = AF_INET,
+	.server_address = "10.0.0.2",
+	.server_port = 6669,
+};
+
+FIXTURE_VARIANT_ADD(quic_data, ipv6_mapped_ipv4_two_conns)
+{
+	.af_client_1 = AF_INET6,
+	.client_1_address = "::ffff:10.0.0.1",
+	.client_1_port = 6670,
+	.conn_id_1 = {0x11, 0x12, 0x13, 0x14},
+	.conn_id_1_len = 4,
+	.setup_flow_1 = SETUP_FLOW,
+	.use_client_1 = USE_CLIENT,
+	.af_client_2 = AF_INET6,
+	.client_2_address = "::ffff:10.0.0.3",
+	.client_2_port = 6671,
+	.conn_id_2 = {0x21, 0x22, 0x23, 0x24},
+	.conn_id_2_len = 4,
+	.setup_flow_2 = SETUP_FLOW,
+	.use_client_2 = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "::ffff:10.0.0.2",
+	.server_port = 6672,
+};
+
+FIXTURE_VARIANT_ADD(quic_data, ipv6_mapped_ipv4_setup_ipv4_one_conn)
+{
+	.af_client_1 = AF_INET,
+	.client_1_address = "10.0.0.3",
+	.client_1_port = 6676,
+	.conn_id_1 = {0x11, 0x12, 0x13, 0x14},
+	.conn_id_1_len = 4,
+	.setup_flow_1 = SETUP_FLOW,
+	.use_client_1 = DO_NOT_USE_CLIENT,
+	.af_client_2 = AF_INET6,
+	.client_2_address = "::ffff:10.0.0.3",
+	.client_2_port = 6676,
+	.conn_id_2 = {0x11, 0x12, 0x13, 0x14},
+	.conn_id_2_len = 4,
+	.setup_flow_2 = DO_NOT_SETUP_FLOW,
+	.use_client_2 = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "::ffff:10.0.0.2",
+	.server_port = 6677,
+};
+
+FIXTURE_VARIANT_ADD(quic_data, ipv6_mapped_ipv4_setup_ipv6_one_conn)
+{
+	.af_client_1 = AF_INET6,
+	.client_1_address = "::ffff:10.0.0.3",
+	.client_1_port = 6678,
+	.conn_id_1 = {0x11, 0x12, 0x13, 0x14},
+	.setup_flow_1 = SETUP_FLOW,
+	.use_client_1 = DO_NOT_USE_CLIENT,
+	.af_client_2 = AF_INET,
+	.client_2_address = "10.0.0.3",
+	.client_2_port = 6678,
+	.conn_id_2 = {0x11, 0x12, 0x13, 0x14},
+	.setup_flow_2 = DO_NOT_SETUP_FLOW,
+	.use_client_2 = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "::ffff:10.0.0.2",
+	.server_port = 6679,
+};
+
+FIXTURE_SETUP(quic_data)
+{
+	char path[PATH_MAX];
+	int optval = 1;
+
+	if (variant->af_client_1 == AF_INET) {
+		self->len_c1 = sizeof(self->client_1.addr);
+		self->client_1.addr.sin_family = variant->af_client_1;
+		inet_pton(variant->af_client_1, variant->client_1_address,
+			  &self->client_1.addr.sin_addr);
+		self->client_1.addr.sin_port = htons(variant->client_1_port);
+	} else {
+		self->len_c1 = sizeof(self->client_1.addr6);
+		self->client_1.addr6.sin6_family = variant->af_client_1;
+		inet_pton(variant->af_client_1, variant->client_1_address,
+			  &self->client_1.addr6.sin6_addr);
+		self->client_1.addr6.sin6_port = htons(variant->client_1_port);
+	}
+
+	if (variant->af_client_2 == AF_INET) {
+		self->len_c2 = sizeof(self->client_2.addr);
+		self->client_2.addr.sin_family = variant->af_client_2;
+		inet_pton(variant->af_client_2, variant->client_2_address,
+			  &self->client_2.addr.sin_addr);
+		self->client_2.addr.sin_port = htons(variant->client_2_port);
+	} else {
+		self->len_c2 = sizeof(self->client_2.addr6);
+		self->client_2.addr6.sin6_family = variant->af_client_2;
+		inet_pton(variant->af_client_2, variant->client_2_address,
+			  &self->client_2.addr6.sin6_addr);
+		self->client_2.addr6.sin6_port = htons(variant->client_2_port);
+	}
+
+	if (variant->af_server == AF_INET) {
+		self->len_s = sizeof(self->server.addr);
+		self->server.addr.sin_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr.sin_addr);
+		self->server.addr.sin_port = htons(variant->server_port);
+	} else {
+		self->len_s = sizeof(self->server.addr6);
+		self->server.addr6.sin6_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr6.sin6_addr);
+		self->server.addr6.sin6_port = htons(variant->server_port);
+	}
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/net", getpid());
+	self->default_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->default_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns11");
+	self->client_1_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->client_1_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns12");
+	self->client_2_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->client_2_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns2");
+	self->server_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->server_net_ns_fd, 0);
+
+	if (variant->use_client_1) {
+		ASSERT_NE(setns(self->client_1_net_ns_fd, 0), -1);
+		self->c1fd = socket(variant->af_client_1, SOCK_DGRAM, 0);
+		ASSERT_NE(setsockopt(self->c1fd, SOL_SOCKET, SO_REUSEPORT,
+				     &optval, sizeof(optval)), -1);
+		if (variant->af_client_1 == AF_INET) {
+			ASSERT_EQ(bind(self->c1fd, &self->client_1.addr,
+				       self->len_c1), 0);
+			ASSERT_EQ(getsockname(self->c1fd, &self->client_1.addr,
+					      &self->len_c1), 0);
+		} else {
+			ASSERT_EQ(bind(self->c1fd, &self->client_1.addr6,
+				       self->len_c1), 0);
+			ASSERT_EQ(getsockname(self->c1fd, &self->client_1.addr6,
+					      &self->len_c1), 0);
+		}
+	}
+
+	if (variant->use_client_2) {
+		ASSERT_NE(setns(self->client_2_net_ns_fd, 0), -1);
+		self->c2fd = socket(variant->af_client_2, SOCK_DGRAM, 0);
+		ASSERT_NE(setsockopt(self->c2fd, SOL_SOCKET, SO_REUSEPORT,
+				     &optval, sizeof(optval)), -1);
+		if (variant->af_client_2 == AF_INET) {
+			ASSERT_EQ(bind(self->c2fd, &self->client_2.addr,
+				       self->len_c2), 0);
+			ASSERT_EQ(getsockname(self->c2fd, &self->client_2.addr,
+					      &self->len_c2), 0);
+		} else {
+			ASSERT_EQ(bind(self->c2fd, &self->client_2.addr6,
+				       self->len_c2), 0);
+			ASSERT_EQ(getsockname(self->c2fd, &self->client_2.addr6,
+					      &self->len_c2), 0);
+		}
+	}
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	self->sfd = socket(variant->af_server, SOCK_DGRAM, 0);
+	ASSERT_NE(setsockopt(self->sfd, SOL_SOCKET, SO_REUSEPORT, &optval,
+			     sizeof(optval)), -1);
+	if (variant->af_server == AF_INET) {
+		ASSERT_EQ(bind(self->sfd, &self->server.addr, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr,
+				      &self->len_s), 0);
+	} else {
+		ASSERT_EQ(bind(self->sfd, &self->server.addr6, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr6,
+				      &self->len_s), 0);
+	}
+
+	ASSERT_EQ(setsockopt(self->sfd, IPPROTO_UDP, UDP_ULP,
+			     "quic-crypto", sizeof("quic-crypto")), 0);
+
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+FIXTURE_TEARDOWN(quic_data)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	close(self->sfd);
+	ASSERT_NE(setns(self->client_1_net_ns_fd, 0), -1);
+	close(self->c1fd);
+	ASSERT_NE(setns(self->client_2_net_ns_fd, 0), -1);
+	close(self->c2fd);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+TEST_F(quic_data, send_fail_no_flow)
+{
+	char const *test_str = "test_read";
+	int send_len = 10;
+
+	ASSERT_EQ(strlen(test_str) + 1, send_len);
+	EXPECT_EQ(sendto(self->sfd, test_str, send_len, 0,
+			 &self->client_1.addr, self->len_c1), -1);
+};
+
+TEST_F(quic_data, fail_wrong_key_generation_bit)
+{
+	size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
+	uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)];
+	struct quic_connection_info conn_1_info;
+	struct quic_connection_info conn_2_info;
+	struct quic_tx_ancillary_data *anc_data;
+	struct cmsghdr *cmsg_hdr;
+	int frag_size = 1200;
+	struct iovec iov[2];
+	int msg_len = 4500;
+	struct msghdr msg;
+	char *test_str_1;
+	char *test_str_2;
+	char *buf_1;
+	char *buf_2;
+	int i;
+
+	test_str_1 = (char *)malloc(9000);
+	test_str_2 = (char *)malloc(9000);
+	memset(test_str_1, 0, 9000);
+	memset(test_str_2, 0, 9000);
+
+	buf_1 = (char *)malloc(10000);
+	buf_2 = (char *)malloc(10000);
+	for (i = 0; i < 9000; i += (1200 - 16)) {
+		test_str_1[i] = 0x44;
+		memcpy(&test_str_1[i + 1], &variant->conn_id_1,
+		       variant->conn_id_1_len);
+		test_str_1[i + 1 + variant->conn_id_1_len] = 0xca;
+
+		test_str_2[i] = 0x44;
+		memcpy(&test_str_2[i + 1], &variant->conn_id_2,
+		       variant->conn_id_2_len);
+		test_str_2[i + 1 + variant->conn_id_2_len] = 0xca;
+	}
+
+	// program the connection into the offload
+	conn_1_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	memset(&conn_1_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_1_info.key.dst_conn_id_length = variant->conn_id_1_len;
+	memcpy(conn_1_info.key.dst_conn_id,
+	       &variant->conn_id_1,
+	       variant->conn_id_1_len);
+	conn_1_info.conn_payload_key_gen = 0;
+
+	if (self->client_1.addr.sin_family == AF_INET) {
+		memcpy(&conn_1_info.key.addr.ipv4_addr,
+		       &self->client_1.addr.sin_addr, sizeof(struct in_addr));
+		conn_1_info.key.udp_port = self->client_1.addr.sin_port;
+	} else {
+		memcpy(&conn_1_info.key.addr.ipv6_addr,
+		       &self->client_1.addr6.sin6_addr,
+		       sizeof(struct in6_addr));
+		conn_1_info.key.udp_port = self->client_1.addr6.sin6_port;
+	}
+
+	conn_2_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	memset(&conn_2_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_2_info.key.dst_conn_id_length = variant->conn_id_2_len;
+	memcpy(conn_2_info.key.dst_conn_id,
+	       &variant->conn_id_2,
+	       variant->conn_id_2_len);
+	conn_2_info.conn_payload_key_gen = 0;
+
+	if (self->client_2.addr.sin_family == AF_INET) {
+		memcpy(&conn_2_info.key.addr.ipv4_addr,
+		       &self->client_2.addr.sin_addr, sizeof(struct in_addr));
+		conn_2_info.key.udp_port = self->client_2.addr.sin_port;
+	} else {
+		memcpy(&conn_2_info.key.addr.ipv6_addr,
+		       &self->client_2.addr6.sin6_addr,
+		       sizeof(struct in6_addr));
+		conn_2_info.key.udp_port = self->client_2.addr6.sin6_port;
+	}
+
+	memcpy(&conn_1_info.aes_gcm_128.payload_key,
+	       &variant->conn_1_key, 16);
+	memcpy(&conn_1_info.aes_gcm_128.payload_iv,
+	       &variant->conn_1_iv, 12);
+	memcpy(&conn_1_info.aes_gcm_128.header_key,
+	       &variant->conn_1_hdr_key, 16);
+	memcpy(&conn_2_info.aes_gcm_128.payload_key,
+	       &variant->conn_2_key, 16);
+	memcpy(&conn_2_info.aes_gcm_128.payload_iv,
+	       &variant->conn_2_iv, 12);
+	memcpy(&conn_2_info.aes_gcm_128.header_key,
+	       &variant->conn_2_hdr_key,
+	       16);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
+			     sizeof(frag_size)), 0);
+
+	if (variant->setup_flow_1)
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_ADD_TX_CONNECTION,
+				     &conn_1_info, sizeof(conn_1_info)), 0);
+
+	if (variant->setup_flow_2)
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_ADD_TX_CONNECTION,
+				     &conn_2_info, sizeof(conn_2_info)), 0);
+
+	iov[0].iov_base = test_str_1;
+	iov[0].iov_len = msg_len;
+	iov[1].iov_base = (void *)test_str_1 + 4500;
+	iov[1].iov_len = msg_len;
+
+	msg.msg_name = (self->client_1.addr.sin_family == AF_INET)
+		       ? (void *)&self->client_1.addr
+		       : (void *)&self->client_1.addr6;
+	msg.msg_namelen = self->len_c1;
+	msg.msg_iov = iov;
+	msg.msg_iovlen = 2;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->flags = 0;
+	anc_data->dst_conn_id_length = variant->conn_id_1_len;
+
+	if (variant->use_client_1)
+		EXPECT_EQ(sendmsg(self->sfd, &msg, 0), -1);
+
+	iov[0].iov_base = test_str_2;
+	iov[0].iov_len = msg_len;
+	iov[1].iov_base = (void *)test_str_2 + 4500;
+	iov[1].iov_len = msg_len;
+	msg.msg_name = (self->client_2.addr.sin_family == AF_INET)
+		       ? (void *)&self->client_2.addr
+		       : (void *)&self->client_2.addr6;
+	msg.msg_namelen = self->len_c2;
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->dst_conn_id_length = variant->conn_id_2_len;
+	anc_data->flags = 0;
+
+	if (variant->use_client_2)
+		EXPECT_EQ(sendmsg(self->sfd, &msg, 0), -1);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	if (variant->setup_flow_1) {
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_DEL_TX_CONNECTION,
+				     &conn_1_info, sizeof(conn_1_info)),
+			  0);
+	}
+	if (variant->setup_flow_2) {
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_DEL_TX_CONNECTION,
+				     &conn_2_info, sizeof(conn_2_info)),
+			  0);
+	}
+	free(test_str_1);
+	free(test_str_2);
+	free(buf_1);
+	free(buf_2);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+TEST_F(quic_data, encrypt_two_conn_gso_1200_iov_2_size_9000_aesgcm128)
+{
+	size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
+	uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)];
+	struct quic_connection_info conn_1_info;
+	struct quic_connection_info conn_2_info;
+	struct quic_tx_ancillary_data *anc_data;
+	socklen_t recv_addr_len_1;
+	socklen_t recv_addr_len_2;
+	struct cmsghdr *cmsg_hdr;
+	int frag_size = 1200;
+	int send_len = 9000;
+	struct iovec iov[2];
+	int msg_len = 4500;
+	struct msghdr msg;
+	char *test_str_1;
+	char *test_str_2;
+	char *buf_1;
+	char *buf_2;
+	int i;
+
+	test_str_1 = (char *)malloc(9000);
+	test_str_2 = (char *)malloc(9000);
+	memset(test_str_1, 0, 9000);
+	memset(test_str_2, 0, 9000);
+
+	buf_1 = (char *)malloc(10000);
+	buf_2 = (char *)malloc(10000);
+	for (i = 0; i < 9000; i += (1200 - 16)) {
+		test_str_1[i] = 0x40;
+		memcpy(&test_str_1[i + 1], &variant->conn_id_1,
+		       variant->conn_id_1_len);
+		test_str_1[i + 1 + variant->conn_id_1_len] = 0xca;
+
+		test_str_2[i] = 0x40;
+		memcpy(&test_str_2[i + 1], &variant->conn_id_2,
+		       variant->conn_id_2_len);
+		test_str_2[i + 1 + variant->conn_id_2_len] = 0xca;
+	}
+
+	// program the connection into the offload
+	conn_1_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	memset(&conn_1_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_1_info.key.dst_conn_id_length = variant->conn_id_1_len;
+	memcpy(conn_1_info.key.dst_conn_id,
+	       &variant->conn_id_1,
+	       variant->conn_id_1_len);
+	conn_1_info.conn_payload_key_gen = 0;
+
+	if (self->client_1.addr.sin_family == AF_INET) {
+		memcpy(&conn_1_info.key.addr.ipv4_addr,
+		       &self->client_1.addr.sin_addr, sizeof(struct in_addr));
+		conn_1_info.key.udp_port = self->client_1.addr.sin_port;
+	} else {
+		memcpy(&conn_1_info.key.addr.ipv6_addr,
+		       &self->client_1.addr6.sin6_addr,
+		       sizeof(struct in6_addr));
+		conn_1_info.key.udp_port = self->client_1.addr6.sin6_port;
+	}
+
+	conn_2_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	memset(&conn_2_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_2_info.key.dst_conn_id_length = variant->conn_id_2_len;
+	memcpy(conn_2_info.key.dst_conn_id,
+	       &variant->conn_id_2,
+	       variant->conn_id_2_len);
+	conn_2_info.conn_payload_key_gen = 0;
+
+	if (self->client_2.addr.sin_family == AF_INET) {
+		memcpy(&conn_2_info.key.addr.ipv4_addr,
+		       &self->client_2.addr.sin_addr, sizeof(struct in_addr));
+		conn_2_info.key.udp_port = self->client_2.addr.sin_port;
+	} else {
+		memcpy(&conn_2_info.key.addr.ipv6_addr,
+		       &self->client_2.addr6.sin6_addr,
+		       sizeof(struct in6_addr));
+		conn_2_info.key.udp_port = self->client_2.addr6.sin6_port;
+	}
+
+	memcpy(&conn_1_info.aes_gcm_128.payload_key,
+	       &variant->conn_1_key, 16);
+	memcpy(&conn_1_info.aes_gcm_128.payload_iv,
+	       &variant->conn_1_iv, 12);
+	memcpy(&conn_1_info.aes_gcm_128.header_key,
+	       &variant->conn_1_hdr_key, 16);
+	memcpy(&conn_2_info.aes_gcm_128.payload_key,
+	       &variant->conn_2_key, 16);
+	memcpy(&conn_2_info.aes_gcm_128.payload_iv,
+	       &variant->conn_2_iv, 12);
+	memcpy(&conn_2_info.aes_gcm_128.header_key,
+	       &variant->conn_2_hdr_key,
+	       16);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
+			     sizeof(frag_size)), 0);
+
+	if (variant->setup_flow_1)
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_ADD_TX_CONNECTION,
+				     &conn_1_info, sizeof(conn_1_info)), 0);
+
+	if (variant->setup_flow_2)
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_ADD_TX_CONNECTION,
+				     &conn_2_info, sizeof(conn_2_info)), 0);
+
+	recv_addr_len_1 = self->len_c1;
+	recv_addr_len_2 = self->len_c2;
+
+	iov[0].iov_base = test_str_1;
+	iov[0].iov_len = msg_len;
+	iov[1].iov_base = (void *)test_str_1 + 4500;
+	iov[1].iov_len = msg_len;
+
+	msg.msg_name = (self->client_1.addr.sin_family == AF_INET)
+		       ? (void *)&self->client_1.addr
+		       : (void *)&self->client_1.addr6;
+	msg.msg_namelen = self->len_c1;
+	msg.msg_iov = iov;
+	msg.msg_iovlen = 2;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->flags = 0;
+	anc_data->dst_conn_id_length = variant->conn_id_1_len;
+
+	if (variant->use_client_1)
+		EXPECT_EQ(sendmsg(self->sfd, &msg, 0), send_len);
+
+	iov[0].iov_base = test_str_2;
+	iov[0].iov_len = msg_len;
+	iov[1].iov_base = (void *)test_str_2 + 4500;
+	iov[1].iov_len = msg_len;
+	msg.msg_name = (self->client_2.addr.sin_family == AF_INET)
+		       ? (void *)&self->client_2.addr
+		       : (void *)&self->client_2.addr6;
+	msg.msg_namelen = self->len_c2;
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->dst_conn_id_length = variant->conn_id_2_len;
+	anc_data->flags = 0;
+
+	if (variant->use_client_2)
+		EXPECT_EQ(sendmsg(self->sfd, &msg, 0), send_len);
+
+	if (variant->use_client_1) {
+		ASSERT_NE(setns(self->client_1_net_ns_fd, 0), -1);
+		if (variant->af_client_1 == AF_INET) {
+			for (i = 0; i < 7; ++i) {
+				EXPECT_EQ(recvfrom(self->c1fd, buf_1, 9000, 0,
+						   &self->client_1.addr,
+						   &recv_addr_len_1),
+					  1200);
+				// Validate framing is intact.
+				EXPECT_EQ(memcmp((void *)buf_1 + 1,
+						 &variant->conn_id_1,
+						 variant->conn_id_1_len), 0);
+			}
+			EXPECT_EQ(recvfrom(self->c1fd, buf_1, 9000, 0,
+					   &self->client_1.addr,
+					   &recv_addr_len_1),
+				  728);
+			EXPECT_EQ(memcmp((void *)buf_1 + 1,
+					 &variant->conn_id_1,
+					 variant->conn_id_1_len), 0);
+		} else {
+			for (i = 0; i < 7; ++i) {
+				EXPECT_EQ(recvfrom(self->c1fd, buf_1, 9000, 0,
+						   &self->client_1.addr6,
+						   &recv_addr_len_1),
+					1200);
+			}
+			EXPECT_EQ(recvfrom(self->c1fd, buf_1, 9000, 0,
+					   &self->client_1.addr6,
+					   &recv_addr_len_1),
+				  728);
+			EXPECT_EQ(memcmp((void *)buf_1 + 1,
+					 &variant->conn_id_1,
+					 variant->conn_id_1_len), 0);
+		}
+		EXPECT_NE(memcmp(buf_1, test_str_1, send_len), 0);
+	}
+
+	if (variant->use_client_2) {
+		ASSERT_NE(setns(self->client_2_net_ns_fd, 0), -1);
+		if (variant->af_client_2 == AF_INET) {
+			for (i = 0; i < 7; ++i) {
+				EXPECT_EQ(recvfrom(self->c2fd, buf_2, 9000, 0,
+						   &self->client_2.addr,
+						   &recv_addr_len_2),
+					  1200);
+				EXPECT_EQ(memcmp((void *)buf_2 + 1,
+						 &variant->conn_id_2,
+						 variant->conn_id_2_len), 0);
+			}
+			EXPECT_EQ(recvfrom(self->c2fd, buf_2, 9000, 0,
+					   &self->client_2.addr,
+					   &recv_addr_len_2),
+				  728);
+			EXPECT_EQ(memcmp((void *)buf_2 + 1,
+					 &variant->conn_id_2,
+					 variant->conn_id_2_len), 0);
+		} else {
+			for (i = 0; i < 7; ++i) {
+				EXPECT_EQ(recvfrom(self->c2fd, buf_2, 9000, 0,
+						   &self->client_2.addr6,
+						   &recv_addr_len_2),
+					  1200);
+				EXPECT_EQ(memcmp((void *)buf_2 + 1,
+						 &variant->conn_id_2,
+						 variant->conn_id_2_len), 0);
+			}
+			EXPECT_EQ(recvfrom(self->c2fd, buf_2, 9000, 0,
+					   &self->client_2.addr6,
+					   &recv_addr_len_2),
+				  728);
+			EXPECT_EQ(memcmp((void *)buf_2 + 1,
+					 &variant->conn_id_2,
+					 variant->conn_id_2_len), 0);
+		}
+		EXPECT_NE(memcmp(buf_2, test_str_2, send_len), 0);
+	}
+
+	if (variant->use_client_1 && variant->use_client_2)
+		EXPECT_NE(memcmp(buf_1, buf_2, send_len), 0);
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	if (variant->setup_flow_1) {
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_DEL_TX_CONNECTION,
+				     &conn_1_info, sizeof(conn_1_info)),
+			  0);
+	}
+	if (variant->setup_flow_2) {
+		ASSERT_EQ(setsockopt(self->sfd, SOL_UDP,
+				     UDP_QUIC_DEL_TX_CONNECTION,
+				     &conn_2_info, sizeof(conn_2_info)),
+			  0);
+	}
+	free(test_str_1);
+	free(test_str_2);
+	free(buf_1);
+	free(buf_2);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+// 3. QUIC Encryption Tests
+
+FIXTURE(quic_crypto)
+{
+	int sfd, cfd;
+	socklen_t len_c;
+	socklen_t len_s;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} client;
+	union {
+		struct sockaddr_in addr;
+		struct sockaddr_in6 addr6;
+	} server;
+	int default_net_ns_fd;
+	int client_net_ns_fd;
+	int server_net_ns_fd;
+};
+
+FIXTURE_VARIANT(quic_crypto)
+{
+	unsigned int af_client;
+	char *client_address;
+	unsigned short client_port;
+	uint32_t algo;
+	size_t conn_key_len;
+	uint8_t conn_id[8];
+	union {
+		uint8_t conn_key_16[16];
+		uint8_t conn_key_32[32];
+	} conn_key;
+	uint8_t conn_iv[12];
+	union {
+		uint8_t conn_hdr_key_16[16];
+		uint8_t conn_hdr_key_32[32];
+	} conn_hdr_key;
+	size_t conn_id_len;
+	bool setup_flow;
+	bool use_client;
+	unsigned int af_server;
+	char *server_address;
+	unsigned short server_port;
+	char plain[128];
+	size_t plain_len;
+	char match[128];
+	size_t match_len;
+	uint32_t next_pkt_num;
+};
+
+FIXTURE_SETUP(quic_crypto)
+{
+	char path[PATH_MAX];
+	int optval = 1;
+
+	if (variant->af_client == AF_INET) {
+		self->len_c = sizeof(self->client.addr);
+		self->client.addr.sin_family = variant->af_client;
+		inet_pton(variant->af_client, variant->client_address,
+			  &self->client.addr.sin_addr);
+		self->client.addr.sin_port = htons(variant->client_port);
+	} else {
+		self->len_c = sizeof(self->client.addr6);
+		self->client.addr6.sin6_family = variant->af_client;
+		inet_pton(variant->af_client, variant->client_address,
+			  &self->client.addr6.sin6_addr);
+		self->client.addr6.sin6_port = htons(variant->client_port);
+	}
+
+	if (variant->af_server == AF_INET) {
+		self->len_s = sizeof(self->server.addr);
+		self->server.addr.sin_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr.sin_addr);
+		self->server.addr.sin_port = htons(variant->server_port);
+	} else {
+		self->len_s = sizeof(self->server.addr6);
+		self->server.addr6.sin6_family = variant->af_server;
+		inet_pton(variant->af_server, variant->server_address,
+			  &self->server.addr6.sin6_addr);
+		self->server.addr6.sin6_port = htons(variant->server_port);
+	}
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/net", getpid());
+	self->default_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->default_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns11");
+	self->client_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->client_net_ns_fd, 0);
+	strcpy(path, "/var/run/netns/ns2");
+	self->server_net_ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(self->server_net_ns_fd, 0);
+
+	if (variant->use_client) {
+		ASSERT_NE(setns(self->client_net_ns_fd, 0), -1);
+		self->cfd = socket(variant->af_client, SOCK_DGRAM, 0);
+		ASSERT_NE(setsockopt(self->cfd, SOL_SOCKET, SO_REUSEPORT,
+				     &optval, sizeof(optval)), -1);
+		if (variant->af_client == AF_INET) {
+			ASSERT_EQ(bind(self->cfd, &self->client.addr,
+				       self->len_c), 0);
+			ASSERT_EQ(getsockname(self->cfd, &self->client.addr,
+					      &self->len_c), 0);
+		} else {
+			ASSERT_EQ(bind(self->cfd, &self->client.addr6,
+				       self->len_c), 0);
+			ASSERT_EQ(getsockname(self->cfd, &self->client.addr6,
+					      &self->len_c), 0);
+		}
+	}
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	self->sfd = socket(variant->af_server, SOCK_DGRAM, 0);
+	ASSERT_NE(setsockopt(self->sfd, SOL_SOCKET, SO_REUSEPORT, &optval,
+			     sizeof(optval)), -1);
+	if (variant->af_server == AF_INET) {
+		ASSERT_EQ(bind(self->sfd, &self->server.addr, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr,
+				      &self->len_s),
+			  0);
+	} else {
+		ASSERT_EQ(bind(self->sfd, &self->server.addr6, self->len_s), 0);
+		ASSERT_EQ(getsockname(self->sfd, &self->server.addr6,
+				      &self->len_s),
+			  0);
+	}
+
+	ASSERT_EQ(setsockopt(self->sfd, IPPROTO_UDP, UDP_ULP,
+			     "quic-crypto", sizeof("quic-crypto")), 0);
+
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+FIXTURE_TEARDOWN(quic_crypto)
+{
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	close(self->sfd);
+	ASSERT_NE(setns(self->client_net_ns_fd, 0), -1);
+	close(self->cfd);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+FIXTURE_VARIANT_ADD(quic_crypto, ipv4_aes_gcm_128)
+{
+	.af_client = AF_INET,
+	.client_address = "10.0.0.1",
+	.client_port = 7667,
+	.algo = TLS_CIPHER_AES_GCM_128,
+	.conn_key_len = 16,
+	.conn_id = {0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12, 0x49},
+	.conn_key = {
+		.conn_key_16 = {0x87, 0x71, 0xea, 0x1d,
+				0xfb, 0xbe, 0x7a, 0x45,
+				0xbb, 0xe2, 0x7e, 0xbc,
+				0x0b, 0x53, 0x94, 0x99
+		},
+	},
+	.conn_iv = {0x3A, 0xA7, 0x46, 0x72, 0xE9, 0x83, 0x6B, 0x55, 0xDA,
+		0x66, 0x7B, 0xDA},
+	.conn_hdr_key = {
+		.conn_hdr_key_16 = {0xc9, 0x8e, 0xfd, 0xf2,
+				    0x0b, 0x64, 0x8c, 0x57,
+				    0xb5, 0x0a, 0xb2, 0xd2,
+				    0x21, 0xd3, 0x66, 0xa5},
+	},
+	.conn_id_len = 8,
+	.setup_flow = SETUP_FLOW,
+	.use_client = USE_CLIENT,
+	.af_server = AF_INET,
+	.server_address = "10.0.0.2",
+	.server_port = 7669,
+	.plain = { 0x40, 0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12,
+		   0x49, 0xca,
+		   // payload
+		   0x02, 0x80, 0xde, 0x40, 0x39, 0x40, 0xf6, 0x00,
+		   0x01, 0x0b, 0x00, 0x0f, 0x65, 0x63, 0x68, 0x6f,
+		   0x20, 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36,
+		   0x37, 0x38, 0x39
+	},
+	.plain_len = 37,
+	.match = {
+		   0x46, 0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12,
+		   0x49, 0x1c, 0x44, 0xb8, 0x41, 0xbb, 0xcf, 0x6e,
+		   0x0a, 0x2a, 0x24, 0xfb, 0xb4, 0x79, 0x62, 0xea,
+		   0x59, 0x38, 0x1a, 0x0e, 0x50, 0x1e, 0x59, 0xed,
+		   0x3f, 0x8e, 0x7e, 0x5a, 0x70, 0xe4, 0x2a, 0xbc,
+		   0x2a, 0xfa, 0x2b, 0x54, 0xeb, 0x89, 0xc3, 0x2c,
+		   0xb6, 0x8c, 0x1e, 0xab, 0x2d
+	},
+	.match_len = 53,
+	.next_pkt_num = 0x0d65c9,
+};
+
+FIXTURE_VARIANT_ADD(quic_crypto, ipv4_chacha20_poly1305)
+{
+	.af_client = AF_INET,
+	.client_address = "10.0.0.1",
+	.client_port = 7801,
+	.algo = TLS_CIPHER_CHACHA20_POLY1305,
+	.conn_key_len = 32,
+	.conn_id = {},
+	.conn_id_len = 0,
+	.conn_key = {
+		.conn_key_32 = {
+			0x3b, 0xfc, 0xdd, 0xd7, 0x2b, 0xcf, 0x02, 0x54,
+			0x1d, 0x7f, 0xa0, 0xdd, 0x1f, 0x5f, 0x9e, 0xee,
+			0xa8, 0x17, 0xe0, 0x9a, 0x69, 0x63, 0xa0, 0xe6,
+			0xc7, 0xdf, 0x0f, 0x9a, 0x1b, 0xab, 0x90, 0xf2,
+		},
+	},
+	.conn_iv = {
+		0xa6, 0xb5, 0xbc, 0x6a, 0xb7, 0xda, 0xfc, 0xe3,
+		0x0f, 0xff, 0xf5, 0xdd,
+	},
+	.conn_hdr_key = {
+		.conn_hdr_key_32 = {
+			0xd6, 0x59, 0x76, 0x0d, 0x2b, 0xa4, 0x34, 0xa2,
+			0x26, 0xfd, 0x37, 0xb3, 0x5c, 0x69, 0xe2, 0xda,
+			0x82, 0x11, 0xd1, 0x0c, 0x4f, 0x12, 0x53, 0x87,
+			0x87, 0xd6, 0x56, 0x45, 0xd5, 0xd1, 0xb8, 0xe2,
+		},
+	},
+	.setup_flow = SETUP_FLOW,
+	.use_client = USE_CLIENT,
+	.af_server = AF_INET,
+	.server_address = "10.0.0.2",
+	.server_port = 7802,
+	.plain = { 0x42, 0x00, 0xbf, 0xf4, 0x01 },
+	.plain_len = 5,
+	.match = { 0x55, 0x58, 0xb1, 0xc6, 0x0a, 0xe7, 0xb6, 0xb9,
+		   0x32, 0xbc, 0x27, 0xd7, 0x86, 0xf4, 0xbc, 0x2b,
+		   0xb2, 0x0f, 0x21, 0x62, 0xba },
+	.match_len = 21,
+	.next_pkt_num = 0x2700bff5,
+};
+
+FIXTURE_VARIANT_ADD(quic_crypto, ipv6_aes_gcm_128)
+{
+	.af_client = AF_INET6,
+	.client_address = "2001::1",
+	.client_port = 7673,
+	.algo = TLS_CIPHER_AES_GCM_128,
+	.conn_key_len = 16,
+	.conn_id = {0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12, 0x49},
+	.conn_key = {
+		.conn_key_16 = {0x87, 0x71, 0xea, 0x1d,
+				0xfb, 0xbe, 0x7a, 0x45,
+				0xbb, 0xe2, 0x7e, 0xbc,
+				0x0b, 0x53, 0x94, 0x99
+		},
+	},
+	.conn_iv = {0x3a, 0xa7, 0x46, 0x72, 0xe9, 0x83, 0x6b, 0x55, 0xda,
+		0x66, 0x7b, 0xda},
+	.conn_hdr_key = {
+		.conn_hdr_key_16 = {0xc9, 0x8e, 0xfd, 0xf2,
+				    0x0b, 0x64, 0x8c, 0x57,
+				    0xb5, 0x0a, 0xb2, 0xd2,
+				    0x21, 0xd3, 0x66, 0xa5},
+	},
+	.conn_id_len = 8,
+	.setup_flow = SETUP_FLOW,
+	.use_client = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "2001::2",
+	.server_port = 7675,
+	.plain = { 0x40, 0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12,
+		   0x49, 0xca,
+		   // Payload
+		   0x02, 0x80, 0xde, 0x40, 0x39, 0x40, 0xf6, 0x00,
+		   0x01, 0x0b, 0x00, 0x0f, 0x65, 0x63, 0x68, 0x6f,
+		   0x20, 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36,
+		   0x37, 0x38, 0x39
+	},
+	.plain_len = 37,
+	.match = {
+		   0x46, 0x08, 0x6b, 0xbf, 0x88, 0x82, 0xb9, 0x12,
+		   0x49, 0x1c, 0x44, 0xb8, 0x41, 0xbb, 0xcf, 0x6e,
+		   0x0a, 0x2a, 0x24, 0xfb, 0xb4, 0x79, 0x62, 0xea,
+		   0x59, 0x38, 0x1a, 0x0e, 0x50, 0x1e, 0x59, 0xed,
+		   0x3f, 0x8e, 0x7e, 0x5a, 0x70, 0xe4, 0x2a, 0xbc,
+		   0x2a, 0xfa, 0x2b, 0x54, 0xeb, 0x89, 0xc3, 0x2c,
+		   0xb6, 0x8c, 0x1e, 0xab, 0x2d
+	},
+	.match_len = 53,
+	.next_pkt_num = 0x0d65c9,
+};
+
+FIXTURE_VARIANT_ADD(quic_crypto, ipv6_chacha20_poly1305)
+{
+	.af_client = AF_INET6,
+	.client_address = "2001::1",
+	.client_port = 7803,
+	.algo = TLS_CIPHER_CHACHA20_POLY1305,
+	.conn_key_len = 32,
+	.conn_id = {},
+	.conn_id_len = 0,
+	.conn_key = {
+		.conn_key_32 = {
+			0x3b, 0xfc, 0xdd, 0xd7, 0x2b, 0xcf, 0x02, 0x54,
+			0x1d, 0x7f, 0xa0, 0xdd, 0x1f, 0x5f, 0x9e, 0xee,
+			0xa8, 0x17, 0xe0, 0x9a, 0x69, 0x63, 0xa0, 0xe6,
+			0xc7, 0xdf, 0x0f, 0x9a, 0x1b, 0xab, 0x90, 0xf2,
+		},
+	},
+	.conn_iv = {
+		0xa6, 0xb5, 0xbc, 0x6a, 0xb7, 0xda, 0xfc, 0xe3,
+		0x0f, 0xff, 0xf5, 0xdd,
+	},
+	.conn_hdr_key = {
+		.conn_hdr_key_32 = {
+			0xd6, 0x59, 0x76, 0x0d, 0x2b, 0xa4, 0x34, 0xa2,
+			0x26, 0xfd, 0x37, 0xb3, 0x5c, 0x69, 0xe2, 0xda,
+			0x82, 0x11, 0xd1, 0x0c, 0x4f, 0x12, 0x53, 0x87,
+			0x87, 0xd6, 0x56, 0x45, 0xd5, 0xd1, 0xb8, 0xe2,
+		},
+	},
+	.setup_flow = SETUP_FLOW,
+	.use_client = USE_CLIENT,
+	.af_server = AF_INET6,
+	.server_address = "2001::2",
+	.server_port = 7804,
+	.plain = { 0x42, 0x00, 0xbf, 0xf4, 0x01 },
+	.plain_len = 5,
+	.match = { 0x55, 0x58, 0xb1, 0xc6, 0x0a, 0xe7, 0xb6, 0xb9,
+		   0x32, 0xbc, 0x27, 0xd7, 0x86, 0xf4, 0xbc, 0x2b,
+		   0xb2, 0x0f, 0x21, 0x62, 0xba },
+	.match_len = 21,
+	.next_pkt_num = 0x2700bff5,
+};
+
+TEST_F(quic_crypto, encrypt_test_vector_single_flow_gso_in_control)
+{
+	uint8_t cmsg_buf[CMSG_SPACE(sizeof(struct quic_tx_ancillary_data))
+			 + CMSG_SPACE(sizeof(uint16_t))];
+	struct quic_tx_ancillary_data *anc_data;
+	struct quic_connection_info conn_info;
+	uint16_t frag_size = 1200;
+	struct cmsghdr *cmsg_hdr;
+	int wrong_frag_size = 26;
+	socklen_t recv_addr_len;
+	struct iovec iov;
+	struct msghdr msg;
+	char *buf;
+
+	buf = (char *)malloc(9000);
+	conn_info.cipher_type = variant->algo;
+	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_info.key.dst_conn_id_length = variant->conn_id_len;
+	memcpy(conn_info.key.dst_conn_id,
+	       &variant->conn_id,
+	       variant->conn_id_len);
+	conn_info.conn_payload_key_gen = 0;
+
+	if (self->client.addr.sin_family == AF_INET) {
+		memcpy(&conn_info.key.addr.ipv4_addr,
+		       &self->client.addr.sin_addr, sizeof(struct in_addr));
+		conn_info.key.udp_port = self->client.addr.sin_port;
+	} else {
+		memcpy(&conn_info.key.addr.ipv6_addr,
+		       &self->client.addr6.sin6_addr,
+		       sizeof(struct in6_addr));
+		conn_info.key.udp_port = self->client.addr6.sin6_port;
+	}
+
+	ASSERT_TRUE(variant->algo == TLS_CIPHER_AES_GCM_128 ||
+		    variant->algo == TLS_CIPHER_CHACHA20_POLY1305);
+	switch (variant->algo) {
+	case TLS_CIPHER_AES_GCM_128:
+		memcpy(&conn_info.aes_gcm_128.payload_key,
+		       &variant->conn_key, 16);
+		memcpy(&conn_info.aes_gcm_128.payload_iv,
+		       &variant->conn_iv, 12);
+		memcpy(&conn_info.aes_gcm_128.header_key,
+		       &variant->conn_hdr_key, 16);
+		break;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		memcpy(&conn_info.chacha20_poly1305.payload_key,
+		       &variant->conn_key, 32);
+		memcpy(&conn_info.chacha20_poly1305.payload_iv,
+		       &variant->conn_iv, 12);
+		memcpy(&conn_info.chacha20_poly1305.header_key,
+		       &variant->conn_hdr_key, 32);
+		break;
+	}
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &wrong_frag_size,
+			     sizeof(wrong_frag_size)), 0);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION,
+			     &conn_info, sizeof(conn_info)), 0);
+
+	recv_addr_len = self->len_c;
+	iov.iov_base = (void *)variant->plain;
+	iov.iov_len = variant->plain_len;
+	memset(cmsg_buf, 0, sizeof(cmsg_buf));
+	msg.msg_name = (self->client.addr.sin_family == AF_INET)
+		       ? (void *)&self->client.addr
+		       : (void *)&self->client.addr6;
+	msg.msg_namelen = self->len_c;
+	msg.msg_iov = &iov;
+	msg.msg_iovlen = 1;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(sizeof(struct quic_tx_ancillary_data));
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->flags = 0;
+	anc_data->next_pkt_num = variant->next_pkt_num;
+	anc_data->dst_conn_id_length = variant->conn_id_len;
+	cmsg_hdr = CMSG_NXTHDR(&msg, cmsg_hdr);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_SEGMENT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(sizeof(uint16_t));
+	memcpy(CMSG_DATA(cmsg_hdr), (void *)&frag_size, sizeof(frag_size));
+
+	EXPECT_EQ(sendmsg(self->sfd, &msg, 0), variant->plain_len);
+	ASSERT_NE(setns(self->client_net_ns_fd, 0), -1);
+	if (variant->af_client == AF_INET) {
+		EXPECT_EQ(recvfrom(self->cfd, buf, 9000, 0,
+				   &self->client.addr, &recv_addr_len),
+			  variant->match_len);
+	} else {
+		EXPECT_EQ(recvfrom(self->cfd, buf, 9000, 0,
+				   &self->client.addr6, &recv_addr_len),
+			  variant->match_len);
+	}
+	EXPECT_STREQ(buf, variant->match);
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION,
+			     &conn_info, sizeof(conn_info)), 0);
+	free(buf);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+TEST_F(quic_crypto, encrypt_test_vector_single_flow_gso_in_setsockopt)
+{
+	uint8_t cmsg_buf[CMSG_SPACE(sizeof(struct quic_tx_ancillary_data))];
+	struct quic_tx_ancillary_data *anc_data;
+	struct quic_connection_info conn_info;
+	int frag_size = 1200;
+	struct cmsghdr *cmsg_hdr;
+	socklen_t recv_addr_len;
+	struct iovec iov;
+	struct msghdr msg;
+	char *buf;
+
+	buf = (char *)malloc(9000);
+	conn_info.cipher_type = variant->algo;
+	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_info.key.dst_conn_id_length = variant->conn_id_len;
+	memcpy(conn_info.key.dst_conn_id,
+	       &variant->conn_id,
+	       variant->conn_id_len);
+	conn_info.conn_payload_key_gen = 0;
+
+	if (self->client.addr.sin_family == AF_INET) {
+		memcpy(&conn_info.key.addr.ipv4_addr,
+		       &self->client.addr.sin_addr, sizeof(struct in_addr));
+		conn_info.key.udp_port = self->client.addr.sin_port;
+	} else {
+		memcpy(&conn_info.key.addr.ipv6_addr,
+		       &self->client.addr6.sin6_addr,
+		       sizeof(struct in6_addr));
+		conn_info.key.udp_port = self->client.addr6.sin6_port;
+	}
+	ASSERT_TRUE(variant->algo == TLS_CIPHER_AES_GCM_128 ||
+		    variant->algo == TLS_CIPHER_CHACHA20_POLY1305);
+	switch (variant->algo) {
+	case TLS_CIPHER_AES_GCM_128:
+		memcpy(&conn_info.aes_gcm_128.payload_key,
+		       &variant->conn_key, 16);
+		memcpy(&conn_info.aes_gcm_128.payload_iv,
+		       &variant->conn_iv, 12);
+		memcpy(&conn_info.aes_gcm_128.header_key,
+		       &variant->conn_hdr_key, 16);
+		break;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		memcpy(&conn_info.chacha20_poly1305.payload_key,
+		       &variant->conn_key, 32);
+		memcpy(&conn_info.chacha20_poly1305.payload_iv,
+		       &variant->conn_iv, 12);
+		memcpy(&conn_info.chacha20_poly1305.header_key,
+		       &variant->conn_hdr_key, 32);
+		break;
+	}
+
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
+			     sizeof(frag_size)), 0);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION,
+			     &conn_info, sizeof(conn_info)), 0);
+
+	recv_addr_len = self->len_c;
+	iov.iov_base = (void *)variant->plain;
+	iov.iov_len = variant->plain_len;
+	memset(cmsg_buf, 0, sizeof(cmsg_buf));
+	msg.msg_name = (self->client.addr.sin_family == AF_INET)
+		       ? (void *)&self->client.addr
+		       : (void *)&self->client.addr6;
+	msg.msg_namelen = self->len_c;
+	msg.msg_iov = &iov;
+	msg.msg_iovlen = 1;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(sizeof(struct quic_tx_ancillary_data));
+	anc_data = (struct quic_tx_ancillary_data *)CMSG_DATA(cmsg_hdr);
+	anc_data->flags = 0;
+	anc_data->next_pkt_num = variant->next_pkt_num;
+	anc_data->dst_conn_id_length = variant->conn_id_len;
+
+	EXPECT_EQ(sendmsg(self->sfd, &msg, 0), variant->plain_len);
+	ASSERT_NE(setns(self->client_net_ns_fd, 0), -1);
+	if (variant->af_client == AF_INET) {
+		EXPECT_EQ(recvfrom(self->cfd, buf, 9000, 0,
+				   &self->client.addr, &recv_addr_len),
+			  variant->match_len);
+	} else {
+		EXPECT_EQ(recvfrom(self->cfd, buf, 9000, 0,
+				   &self->client.addr6, &recv_addr_len),
+			  variant->match_len);
+	}
+	EXPECT_STREQ(buf, variant->match);
+	ASSERT_NE(setns(self->server_net_ns_fd, 0), -1);
+	ASSERT_EQ(setsockopt(self->sfd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION,
+			     &conn_info, sizeof(conn_info)), 0);
+	free(buf);
+	ASSERT_NE(setns(self->default_net_ns_fd, 0), -1);
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/net/quic.sh b/tools/testing/selftests/net/quic.sh
new file mode 100755
index 000000000000..8ff8bc494671
--- /dev/null
+++ b/tools/testing/selftests/net/quic.sh
@@ -0,0 +1,46 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+sudo ip netns add ns11
+sudo ip netns add ns12
+sudo ip netns add ns2
+sudo ip link add veth11 type veth peer name br-veth11
+sudo ip link add veth12 type veth peer name br-veth12
+sudo ip link add veth2 type veth peer name br-veth2
+sudo ip link set veth11 netns ns11
+sudo ip link set veth12 netns ns12
+sudo ip link set veth2 netns ns2
+sudo ip netns exec ns11 ip addr add 10.0.0.1/24 dev veth11
+sudo ip netns exec ns11 ip addr add ::ffff:10.0.0.1/96 dev veth11
+sudo ip netns exec ns11 ip addr add 2001::1/64 dev veth11
+sudo ip netns exec ns12 ip addr add 10.0.0.3/24 dev veth12
+sudo ip netns exec ns12 ip addr add ::ffff:10.0.0.3/96 dev veth12
+sudo ip netns exec ns12 ip addr add 2001::3/64 dev veth12
+sudo ip netns exec ns2 ip addr add 10.0.0.2/24 dev veth2
+sudo ip netns exec ns2 ip addr add ::ffff:10.0.0.2/96 dev veth2
+sudo ip netns exec ns2 ip addr add 2001::2/64 dev veth2
+sudo ip link add name br1 type bridge forward_delay 0
+sudo ip link set br1 up
+sudo ip link set br-veth11 up
+sudo ip link set br-veth12 up
+sudo ip link set br-veth2 up
+sudo ip netns exec ns11 ip link set veth11 up
+sudo ip netns exec ns12 ip link set veth12 up
+sudo ip netns exec ns2 ip link set veth2 up
+sudo ip link set br-veth11 master br1
+sudo ip link set br-veth12 master br1
+sudo ip link set br-veth2 master br1
+sudo ip netns exec ns2 cat /proc/net/quic_stat
+
+printf "%s" "Waiting for bridge to start fowarding ..."
+while ! timeout 0.5 sudo ip netns exec ns2 ping -c 1 -n 2001::1 &> /dev/null
+do
+	printf "%c" "."
+done
+printf "\n%s\n"  "Bridge is operational"
+
+sudo ./quic
+sudo ip netns exec ns2 cat /proc/net/quic_stat
+sudo ip netns delete ns2
+sudo ip netns delete ns12
+sudo ip netns delete ns11
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [net-next v4 1/6] net: Documentation on QUIC kernel Tx crypto.
  2022-09-09  0:12   ` [net-next v4 1/6] net: Documentation on QUIC kernel Tx crypto Adel Abouchaev
@ 2022-09-09  1:40     ` Bagas Sanjaya
  0 siblings, 0 replies; 77+ messages in thread
From: Bagas Sanjaya @ 2022-09-09  1:40 UTC (permalink / raw)
  To: Adel Abouchaev, davem, edumazet, kuba, pabeni, corbet, dsahern,
	shuah, paul, jmorris, serge, linux-security-module, netdev,
	linux-doc, linux-kselftest
  Cc: kernel test robot

On 9/9/22 07:12, Adel Abouchaev wrote:
> Add documentation for kernel QUIC code.
> 

LGTM, thanks.

Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [net-next v2 0/6] net: support QUIC crypto
  2022-08-24 23:09     ` Adel Abouchaev
@ 2022-09-25 18:04       ` Willem de Bruijn
  2022-09-27 16:44         ` Adel Abouchaev
  0 siblings, 1 reply; 77+ messages in thread
From: Willem de Bruijn @ 2022-09-25 18:04 UTC (permalink / raw)
  To: Adel Abouchaev
  Cc: Xin Long, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Jonathan Corbet, David Ahern, shuah, imagedong, network dev,
	linux-doc, linux-kselftest

> >
> > The patch seems to get the crypto_ctx by doing a connection hash table
> > lookup in the sendmsg(), which is not good from the performance side.
> > One QUIC connection can go over multiple UDP sockets, but I don't
> > think one socket can be used by multiple QUIC connections. So why not
> > save the ctx in the socket instead?
> A single socket could have multiple connections originated from it,
> having different destinations, if the socket is not connected. An
> optimization could be made for connected sockets to cache the context
> and save time on a lookup. The measurement of kernel operations timing
> did not reveal a significant amount of time spent in this lookup due to
> a relatively small number of connections per socket in general. A shared
> table across multiple sockets might experience a different performance
> grading.

I'm late to this patch series, sorry. High quality implementation. I
have a few design questions similar to Xin.

If multiplexing, instead of looking up a connection by { address, port
variable length connection ID }, perhaps return a connection table
index on setsockopt and use that in sendmsg.

> >
> > The patch is to reduce the copying operations between user space and
> > the kernel. I might miss something in your user space code, but the
> > msg to send is *already packed* into the Stream Frame in user space,
> > what's the difference if you encrypt it in userspace and then
> > sendmsg(udp_sk) with zero-copy to the kernel.
> It is possible to do it this way. Zero-copy works best with packet sizes
> starting at 32K and larger.  Anything less than that would consume the
> improvements of zero-copy by zero-copy pre/post operations and needs to
> align memory.

Part of the cost of MSG_ZEROCOPY is in mapping and unmapping user
pages. This series re-implements that with its own get_user_pages.
That is duplicative non-trivial code. And it will incur the same cost.
What this implementation saves is the (indeed non-trivial)
asynchronous completion notification over the error queue.

The cover letter gives some performance numbers against a userspace
implementation that has to copy from user to kernel. It might be more
even to compare against an implementation using MSG_ZEROCOPY and
UDP_SEGMENT. A userspace crypto implementation may have other benefits
compared to a kernel implementation, such as not having to convert to
crypto API scatter-gather arrays and back to network structures.

A few related points

- The implementation support multiplexed connections, but only one
crypto sendmsg can be outstanding at any time:

  + /**
  + * To synchronize concurrent sendmsg() requests through the same socket
  + * and protect preallocated per-context memory.
  + **/
  + struct mutex sendmsg_mux;

That is quite limiting for production workloads.

- Crypto operations are also executed synchronously, using
crypto_wait_req after each operationn. This limits throughput by using
at most one core per UDP socket. And adds sendmsg latency (which may
or may not be important to the application). Wireguard shows an
example of how to parallelize software crypto across cores.

- The implementation avoids dynamic allocation of cipher text pages by
using a single ctx->cipher_page. This is protected by sendmsg_mux (see
above). Is that safe when packets leave the protocol stack and are
then held in a qdisc or when being processed by the NIC?
quic_sendmsg_locked will return, but the cipher page is not free to
reuse yet.

- The real benefit of kernel QUIC will come from HW offload. Would it
be better to avoid the complexity of an in-kernel software
implementation and only focus on HW offload? Basically, pass the
plaintext QUIC packets over a standard UDP socket and alongside in a
cmsg pass either an index into a HW security association database or
the immediate { key, iv } connection_info (for stateless sockets), to
be encoded into the descriptor by the device driver.

- With such a simpler path, could we avoid introducing ULP and just
have udp [gs]etsockopt CRYPTO_STATE. Where QUIC is the only defined
state type yet.

- Small aside: as the series introduces new APIs with non-trivial
parsing in the kernel, it's good to run a fuzzer like syzkaller on it
(if not having done so yet).

> The other possible obstacle would be that eventual support
> of QUIC encryption and decryption in hardware would integrate well with
> this current approach.
> >
> > Didn't really understand the "GSO" you mentioned, as I don't see any
> > code about kernel GSO, I guess it's just "Fragment size", right?
> > BTW, it‘s not common to use "//" for the kernel annotation.

minor point: fragment has meaning in IPv4. For GSO, prefer gso_size.

> Once the payload arrives into the kernel, the GSO on the interface would
> instruct L3/L4 stack on fragmentation. In this case, the plaintext QUIC
> packets should be aligned on the GSO marks less the tag size that would
> be added by encryption. For GSO size 1000, the QUIC packets in the batch
> for transmission should all be 984 bytes long, except maybe the last
> one. Once the tag is attached, the new size of 1000 will correctly split
> the QUIC packets further down the stack for transmission in individual
> IP/UDP packets. The code is also saving processing time by sending all
> packets at once to UDP in a single call, when GSO is enabled.
> >
> > I'm not sure if it's worth adding a ULP layer over UDP for this QUIC
> > TX only. Honestly, I'm more supporting doing a full QUIC stack in the
> > kernel independently with socket APIs to use it:
> > https://github.com/lxin/tls_hs.
> >
> > Thanks.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [net-next v2 0/6] net: support QUIC crypto
  2022-09-25 18:04       ` Willem de Bruijn
@ 2022-09-27 16:44         ` Adel Abouchaev
  2022-09-27 17:12           ` Willem de Bruijn
  0 siblings, 1 reply; 77+ messages in thread
From: Adel Abouchaev @ 2022-09-27 16:44 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Xin Long, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Jonathan Corbet, David Ahern, shuah, imagedong, network dev,
	linux-doc, linux-kselftest


On 9/25/22 11:04 AM, Willem de Bruijn wrote:
>>> The patch seems to get the crypto_ctx by doing a connection hash table
>>> lookup in the sendmsg(), which is not good from the performance side.
>>> One QUIC connection can go over multiple UDP sockets, but I don't
>>> think one socket can be used by multiple QUIC connections. So why not
>>> save the ctx in the socket instead?
>> A single socket could have multiple connections originated from it,
>> having different destinations, if the socket is not connected. An
>> optimization could be made for connected sockets to cache the context
>> and save time on a lookup. The measurement of kernel operations timing
>> did not reveal a significant amount of time spent in this lookup due to
>> a relatively small number of connections per socket in general. A shared
>> table across multiple sockets might experience a different performance
>> grading.
> I'm late to this patch series, sorry. High quality implementation. I
> have a few design questions similar to Xin.
>
> If multiplexing, instead of looking up a connection by { address, port
> variable length connection ID }, perhaps return a connection table
> index on setsockopt and use that in sendmsg.


It was deliberate to not to return anything other than 0 from 
setsockopt() as defined in the spec for the function. Despite that it 
says "shall", the doc says that 0 is the only value for successful 
operation. This was the reason not to use setsockopt() for any 
bidirectional transfers of data and or status. A more sophisticated 
approach with netlink sockets would be more suitable for it. The second 
reason is the API asymmetry for Tx and Rx which will be introduced - the 
Rx will still need to match on the address, port and cid. The third 
reason is that in current implementations there are no more than a few 
connections per socket, which does not abuse the rhashtable that does a 
lookup, although it takes time to hash the key into a hash for a seek. 
The performance measurement ran against the runtime and did not flag 
this path as underperforming either, there were other parts that 
substantially add to the runtime, not the key lookup though.


>>> The patch is to reduce the copying operations between user space and
>>> the kernel. I might miss something in your user space code, but the
>>> msg to send is *already packed* into the Stream Frame in user space,
>>> what's the difference if you encrypt it in userspace and then
>>> sendmsg(udp_sk) with zero-copy to the kernel.
>> It is possible to do it this way. Zero-copy works best with packet sizes
>> starting at 32K and larger.  Anything less than that would consume the
>> improvements of zero-copy by zero-copy pre/post operations and needs to
>> align memory.
> Part of the cost of MSG_ZEROCOPY is in mapping and unmapping user
> pages. This series re-implements that with its own get_user_pages.
> That is duplicative non-trivial code. And it will incur the same cost.
> What this implementation saves is the (indeed non-trivial)
> asynchronous completion notification over the error queue.
>
> The cover letter gives some performance numbers against a userspace
> implementation that has to copy from user to kernel. It might be more
> even to compare against an implementation using MSG_ZEROCOPY and
> UDP_SEGMENT. A userspace crypto implementation may have other benefits
> compared to a kernel implementation, such as not having to convert to
> crypto API scatter-gather arrays and back to network structures.
>
> A few related points
>
> - The implementation support multiplexed connections, but only one
> crypto sendmsg can be outstanding at any time:
>
>    + /**
>    + * To synchronize concurrent sendmsg() requests through the same socket
>    + * and protect preallocated per-context memory.
>    + **/
>    + struct mutex sendmsg_mux;
>
> That is quite limiting for production workloads.

The use case that we have with MVFST library currently runs a single 
worker for a connection and has a single socket attached to it. QUIC 
allows simultaneous use of multiple connection IDs to swap them in 
runtime, and implementation would request only a handful of these. The 
MVFST batches writes into a block of about 8Kb and then uses GSO to send 
them all at once.

> - Crypto operations are also executed synchronously, using
> crypto_wait_req after each operationn. This limits throughput by using
> at most one core per UDP socket. And adds sendmsg latency (which may
> or may not be important to the application). Wireguard shows an
> example of how to parallelize software crypto across cores.
>
> - The implementation avoids dynamic allocation of cipher text pages by
> using a single ctx->cipher_page. This is protected by sendmsg_mux (see
> above). Is that safe when packets leave the protocol stack and are
> then held in a qdisc or when being processed by the NIC?
> quic_sendmsg_locked will return, but the cipher page is not free to
> reuse yet.
There is currently no use case that we have in hands that requires 
parallel transmission of data for the same connection. Multiple 
connections would have no issue running in parallel as each of them will 
have it's own preallocated cipher_page in the context.

There is a fragmentation further down the stack with 
ip_generic_getfrag() that eventually does copy_from_iter() and makea a 
copy of the data. This is executed as part of __ip_append_data() called 
from udp_sendmsg() in ipv4/udp.c. The assumption was that this is 
executed synchronously and the queues and NIC will see a mapping of a 
different memory area than the ciphertext in the pre-allocated page.

>
> - The real benefit of kernel QUIC will come from HW offload. Would it
> be better to avoid the complexity of an in-kernel software
> implementation and only focus on HW offload? Basically, pass the
> plaintext QUIC packets over a standard UDP socket and alongside in a
> cmsg pass either an index into a HW security association database or
> the immediate { key, iv } connection_info (for stateless sockets), to
> be encoded into the descriptor by the device driver.
Hardware usually targets a single ciphersuite such as AES-GCM-128/256, 
while QUIC also supports Chacha20-Poly1305 and AES-CCM. The generalized 
support for offload prompted implementation of these ciphers in kernel 
code. The kernel code could also engage if the future hardware has 
capacity caps preventing it from handling all requests in the hardware.
> - With such a simpler path, could we avoid introducing ULP and just
> have udp [gs]etsockopt CRYPTO_STATE. Where QUIC is the only defined
> state type yet.
>
> - Small aside: as the series introduces new APIs with non-trivial
> parsing in the kernel, it's good to run a fuzzer like syzkaller on it
> (if not having done so yet).
Agreed.
>> The other possible obstacle would be that eventual support
>> of QUIC encryption and decryption in hardware would integrate well with
>> this current approach.
>>> Didn't really understand the "GSO" you mentioned, as I don't see any
>>> code about kernel GSO, I guess it's just "Fragment size", right?
>>> BTW, it‘s not common to use "//" for the kernel annotation.
> minor point: fragment has meaning in IPv4. For GSO, prefer gso_size.
Sure, will change it to gso_size.
>
>> Once the payload arrives into the kernel, the GSO on the interface would
>> instruct L3/L4 stack on fragmentation. In this case, the plaintext QUIC
>> packets should be aligned on the GSO marks less the tag size that would
>> be added by encryption. For GSO size 1000, the QUIC packets in the batch
>> for transmission should all be 984 bytes long, except maybe the last
>> one. Once the tag is attached, the new size of 1000 will correctly split
>> the QUIC packets further down the stack for transmission in individual
>> IP/UDP packets. The code is also saving processing time by sending all
>> packets at once to UDP in a single call, when GSO is enabled.
>>> I'm not sure if it's worth adding a ULP layer over UDP for this QUIC
>>> TX only. Honestly, I'm more supporting doing a full QUIC stack in the
>>> kernel independently with socket APIs to use it:
>>> https://github.com/lxin/tls_hs.
>>>
>>> Thanks.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [net-next v2 0/6] net: support QUIC crypto
  2022-09-27 16:44         ` Adel Abouchaev
@ 2022-09-27 17:12           ` Willem de Bruijn
  2022-09-27 17:28             ` Adel Abouchaev
  0 siblings, 1 reply; 77+ messages in thread
From: Willem de Bruijn @ 2022-09-27 17:12 UTC (permalink / raw)
  To: Adel Abouchaev
  Cc: Willem de Bruijn, Xin Long, Jakub Kicinski, davem, Eric Dumazet,
	Paolo Abeni, Jonathan Corbet, David Ahern, shuah, imagedong,
	network dev, linux-doc, linux-kselftest

On Tue, Sep 27, 2022 at 12:45 PM Adel Abouchaev <adel.abushaev@gmail.com> wrote:
>
>
> On 9/25/22 11:04 AM, Willem de Bruijn wrote:
> >>> The patch seems to get the crypto_ctx by doing a connection hash table
> >>> lookup in the sendmsg(), which is not good from the performance side.
> >>> One QUIC connection can go over multiple UDP sockets, but I don't
> >>> think one socket can be used by multiple QUIC connections. So why not
> >>> save the ctx in the socket instead?
> >> A single socket could have multiple connections originated from it,
> >> having different destinations, if the socket is not connected. An
> >> optimization could be made for connected sockets to cache the context
> >> and save time on a lookup. The measurement of kernel operations timing
> >> did not reveal a significant amount of time spent in this lookup due to
> >> a relatively small number of connections per socket in general. A shared
> >> table across multiple sockets might experience a different performance
> >> grading.
> > I'm late to this patch series, sorry. High quality implementation. I
> > have a few design questions similar to Xin.
> >
> > If multiplexing, instead of looking up a connection by { address, port
> > variable length connection ID }, perhaps return a connection table
> > index on setsockopt and use that in sendmsg.
>
>
> It was deliberate to not to return anything other than 0 from
> setsockopt() as defined in the spec for the function. Despite that it
> says "shall", the doc says that 0 is the only value for successful
> operation. This was the reason not to use setsockopt() for any
> bidirectional transfers of data and or status. A more sophisticated
> approach with netlink sockets would be more suitable for it. The second
> reason is the API asymmetry for Tx and Rx which will be introduced - the

I thought the cover letter indicated that due to asymmetry of most
QUIC workloads, only Tx offload is implemented. You do intend to add
Rx later?

> Rx will still need to match on the address, port and cid. The third
> reason is that in current implementations there are no more than a few
> connections per socket,

This is very different from how we do QUIC at Google. There we
definitely multiplex many connections across essentially a socket per
CPU IIRC.

> which does not abuse the rhashtable that does a
> lookup, although it takes time to hash the key into a hash for a seek.
> The performance measurement ran against the runtime and did not flag
> this path as underperforming either, there were other parts that
> substantially add to the runtime, not the key lookup though.
>
>
> >>> The patch is to reduce the copying operations between user space and
> >>> the kernel. I might miss something in your user space code, but the
> >>> msg to send is *already packed* into the Stream Frame in user space,
> >>> what's the difference if you encrypt it in userspace and then
> >>> sendmsg(udp_sk) with zero-copy to the kernel.
> >> It is possible to do it this way. Zero-copy works best with packet sizes
> >> starting at 32K and larger.  Anything less than that would consume the
> >> improvements of zero-copy by zero-copy pre/post operations and needs to
> >> align memory.
> > Part of the cost of MSG_ZEROCOPY is in mapping and unmapping user
> > pages. This series re-implements that with its own get_user_pages.
> > That is duplicative non-trivial code. And it will incur the same cost.
> > What this implementation saves is the (indeed non-trivial)
> > asynchronous completion notification over the error queue.
> >
> > The cover letter gives some performance numbers against a userspace
> > implementation that has to copy from user to kernel. It might be more
> > even to compare against an implementation using MSG_ZEROCOPY and
> > UDP_SEGMENT. A userspace crypto implementation may have other benefits
> > compared to a kernel implementation, such as not having to convert to
> > crypto API scatter-gather arrays and back to network structures.
> >
> > A few related points
> >
> > - The implementation support multiplexed connections, but only one
> > crypto sendmsg can be outstanding at any time:
> >
> >    + /**
> >    + * To synchronize concurrent sendmsg() requests through the same socket
> >    + * and protect preallocated per-context memory.
> >    + **/
> >    + struct mutex sendmsg_mux;
> >
> > That is quite limiting for production workloads.
>
> The use case that we have with MVFST library currently runs a single
> worker for a connection and has a single socket attached to it. QUIC
> allows simultaneous use of multiple connection IDs to swap them in
> runtime, and implementation would request only a handful of these. The
> MVFST batches writes into a block of about 8Kb and then uses GSO to send
> them all at once.
>
> > - Crypto operations are also executed synchronously, using
> > crypto_wait_req after each operationn. This limits throughput by using
> > at most one core per UDP socket. And adds sendmsg latency (which may
> > or may not be important to the application). Wireguard shows an
> > example of how to parallelize software crypto across cores.
> >
> > - The implementation avoids dynamic allocation of cipher text pages by
> > using a single ctx->cipher_page. This is protected by sendmsg_mux (see
> > above). Is that safe when packets leave the protocol stack and are
> > then held in a qdisc or when being processed by the NIC?
> > quic_sendmsg_locked will return, but the cipher page is not free to
> > reuse yet.
> There is currently no use case that we have in hands that requires
> parallel transmission of data for the same connection. Multiple
> connections would have no issue running in parallel as each of them will
> have it's own preallocated cipher_page in the context.

This still leaves the point that sendmsg may return and the mutex
released while the cipher_page is still associated with an skb in the
transmit path.

> There is a fragmentation further down the stack with
> ip_generic_getfrag() that eventually does copy_from_iter() and makea a
> copy of the data. This is executed as part of __ip_append_data() called
> from udp_sendmsg() in ipv4/udp.c. The assumption was that this is
> executed synchronously and the queues and NIC will see a mapping of a
> different memory area than the ciphertext in the pre-allocated page.
>
> >
> > - The real benefit of kernel QUIC will come from HW offload. Would it
> > be better to avoid the complexity of an in-kernel software
> > implementation and only focus on HW offload? Basically, pass the
> > plaintext QUIC packets over a standard UDP socket and alongside in a
> > cmsg pass either an index into a HW security association database or
> > the immediate { key, iv } connection_info (for stateless sockets), to
> > be encoded into the descriptor by the device driver.
> Hardware usually targets a single ciphersuite such as AES-GCM-128/256,
> while QUIC also supports Chacha20-Poly1305 and AES-CCM. The generalized
> support for offload prompted implementation of these ciphers in kernel
> code.

All userspace libraries also support all protocols as fall-back. No
need for two fall-backs if HW support is missing?

> The kernel code could also engage if the future hardware has
> capacity caps preventing it from handling all requests in the hardware.
> > - With such a simpler path, could we avoid introducing ULP and just
> > have udp [gs]etsockopt CRYPTO_STATE. Where QUIC is the only defined
> > state type yet.
> >
> > - Small aside: as the series introduces new APIs with non-trivial
> > parsing in the kernel, it's good to run a fuzzer like syzkaller on it
> > (if not having done so yet).
> Agreed.
> >> The other possible obstacle would be that eventual support
> >> of QUIC encryption and decryption in hardware would integrate well with
> >> this current approach.
> >>> Didn't really understand the "GSO" you mentioned, as I don't see any
> >>> code about kernel GSO, I guess it's just "Fragment size", right?
> >>> BTW, it‘s not common to use "//" for the kernel annotation.
> > minor point: fragment has meaning in IPv4. For GSO, prefer gso_size.
> Sure, will change it to gso_size.
> >
> >> Once the payload arrives into the kernel, the GSO on the interface would
> >> instruct L3/L4 stack on fragmentation. In this case, the plaintext QUIC
> >> packets should be aligned on the GSO marks less the tag size that would
> >> be added by encryption. For GSO size 1000, the QUIC packets in the batch
> >> for transmission should all be 984 bytes long, except maybe the last
> >> one. Once the tag is attached, the new size of 1000 will correctly split
> >> the QUIC packets further down the stack for transmission in individual
> >> IP/UDP packets. The code is also saving processing time by sending all
> >> packets at once to UDP in a single call, when GSO is enabled.
> >>> I'm not sure if it's worth adding a ULP layer over UDP for this QUIC
> >>> TX only. Honestly, I'm more supporting doing a full QUIC stack in the
> >>> kernel independently with socket APIs to use it:
> >>> https://github.com/lxin/tls_hs.
> >>>
> >>> Thanks.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [net-next v2 0/6] net: support QUIC crypto
  2022-09-27 17:12           ` Willem de Bruijn
@ 2022-09-27 17:28             ` Adel Abouchaev
  0 siblings, 0 replies; 77+ messages in thread
From: Adel Abouchaev @ 2022-09-27 17:28 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Xin Long, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Jonathan Corbet, David Ahern, shuah, imagedong, network dev,
	linux-doc, linux-kselftest


On 9/27/22 10:12 AM, Willem de Bruijn wrote:
> On Tue, Sep 27, 2022 at 12:45 PM Adel Abouchaev <adel.abushaev@gmail.com> wrote:
>>
>> On 9/25/22 11:04 AM, Willem de Bruijn wrote:
>>>>> The patch seems to get the crypto_ctx by doing a connection hash table
>>>>> lookup in the sendmsg(), which is not good from the performance side.
>>>>> One QUIC connection can go over multiple UDP sockets, but I don't
>>>>> think one socket can be used by multiple QUIC connections. So why not
>>>>> save the ctx in the socket instead?
>>>> A single socket could have multiple connections originated from it,
>>>> having different destinations, if the socket is not connected. An
>>>> optimization could be made for connected sockets to cache the context
>>>> and save time on a lookup. The measurement of kernel operations timing
>>>> did not reveal a significant amount of time spent in this lookup due to
>>>> a relatively small number of connections per socket in general. A shared
>>>> table across multiple sockets might experience a different performance
>>>> grading.
>>> I'm late to this patch series, sorry. High quality implementation. I
>>> have a few design questions similar to Xin.
>>>
>>> If multiplexing, instead of looking up a connection by { address, port
>>> variable length connection ID }, perhaps return a connection table
>>> index on setsockopt and use that in sendmsg.
>>
>> It was deliberate to not to return anything other than 0 from
>> setsockopt() as defined in the spec for the function. Despite that it
>> says "shall", the doc says that 0 is the only value for successful
>> operation. This was the reason not to use setsockopt() for any
>> bidirectional transfers of data and or status. A more sophisticated
>> approach with netlink sockets would be more suitable for it. The second
>> reason is the API asymmetry for Tx and Rx which will be introduced - the
> I thought the cover letter indicated that due to asymmetry of most
> QUIC workloads, only Tx offload is implemented. You do intend to add
> Rx later?
We are planning to include Rx later as well, Tx is more compelling from 
use case perspective as the main application of this would go to an 
Edge, where Tx is heavy and Rx is much lighter.
>> Rx will still need to match on the address, port and cid. The third
>> reason is that in current implementations there are no more than a few
>> connections per socket,
> This is very different from how we do QUIC at Google. There we
> definitely multiplex many connections across essentially a socket per
> CPU IIRC.
Ian mentioned that for such a case, while using a zero length QUIC CID, 
the remote ends are ephemeral ports and are different. In which case 
each connection will have its own context and yes, it will load the hash 
table quite a bit for a socket. The limiting factor is still a BSD4.4 
interface and returning a value from setsockopt() with the entry ID. It 
might be more suitable once a standard netlink interface is available 
for this to easily plug in. However, this will further complicate a 
userspace library integration which is a not so trivial task today too.
>
>> which does not abuse the rhashtable that does a
>> lookup, although it takes time to hash the key into a hash for a seek.
>> The performance measurement ran against the runtime and did not flag
>> this path as underperforming either, there were other parts that
>> substantially add to the runtime, not the key lookup though.
>>
>>
>>>>> The patch is to reduce the copying operations between user space and
>>>>> the kernel. I might miss something in your user space code, but the
>>>>> msg to send is *already packed* into the Stream Frame in user space,
>>>>> what's the difference if you encrypt it in userspace and then
>>>>> sendmsg(udp_sk) with zero-copy to the kernel.
>>>> It is possible to do it this way. Zero-copy works best with packet sizes
>>>> starting at 32K and larger.  Anything less than that would consume the
>>>> improvements of zero-copy by zero-copy pre/post operations and needs to
>>>> align memory.
>>> Part of the cost of MSG_ZEROCOPY is in mapping and unmapping user
>>> pages. This series re-implements that with its own get_user_pages.
>>> That is duplicative non-trivial code. And it will incur the same cost.
>>> What this implementation saves is the (indeed non-trivial)
>>> asynchronous completion notification over the error queue.
>>>
>>> The cover letter gives some performance numbers against a userspace
>>> implementation that has to copy from user to kernel. It might be more
>>> even to compare against an implementation using MSG_ZEROCOPY and
>>> UDP_SEGMENT. A userspace crypto implementation may have other benefits
>>> compared to a kernel implementation, such as not having to convert to
>>> crypto API scatter-gather arrays and back to network structures.
>>>
>>> A few related points
>>>
>>> - The implementation support multiplexed connections, but only one
>>> crypto sendmsg can be outstanding at any time:
>>>
>>>     + /**
>>>     + * To synchronize concurrent sendmsg() requests through the same socket
>>>     + * and protect preallocated per-context memory.
>>>     + **/
>>>     + struct mutex sendmsg_mux;
>>>
>>> That is quite limiting for production workloads.
>> The use case that we have with MVFST library currently runs a single
>> worker for a connection and has a single socket attached to it. QUIC
>> allows simultaneous use of multiple connection IDs to swap them in
>> runtime, and implementation would request only a handful of these. The
>> MVFST batches writes into a block of about 8Kb and then uses GSO to send
>> them all at once.
>>
>>> - Crypto operations are also executed synchronously, using
>>> crypto_wait_req after each operationn. This limits throughput by using
>>> at most one core per UDP socket. And adds sendmsg latency (which may
>>> or may not be important to the application). Wireguard shows an
>>> example of how to parallelize software crypto across cores.
>>>
>>> - The implementation avoids dynamic allocation of cipher text pages by
>>> using a single ctx->cipher_page. This is protected by sendmsg_mux (see
>>> above). Is that safe when packets leave the protocol stack and are
>>> then held in a qdisc or when being processed by the NIC?
>>> quic_sendmsg_locked will return, but the cipher page is not free to
>>> reuse yet.
>> There is currently no use case that we have in hands that requires
>> parallel transmission of data for the same connection. Multiple
>> connections would have no issue running in parallel as each of them will
>> have it's own preallocated cipher_page in the context.
> This still leaves the point that sendmsg may return and the mutex
> released while the cipher_page is still associated with an skb in the
> transmit path.
Correct, there is a further copy from the cipher buffer into fragmented 
pieces done by ip_generic_getfrag(). Am I reading it wrong that it would 
not need the cipher text allocated buffer after that is done and all the 
data is copied from it into further structures using copy_from_iter()? 
The skb would be built upon a different buffer space than the encrypted 
data. The udp_sendmsg() assumes that it receives the memory from 
userspace and does all these ops to move stuff from userspace. While 
doing that, it would omit tons of it's checks as the memory is already 
in kernel and still execute quickly.
>
>> There is a fragmentation further down the stack with
>> ip_generic_getfrag() that eventually does copy_from_iter() and makea a
>> copy of the data. This is executed as part of __ip_append_data() called
>> from udp_sendmsg() in ipv4/udp.c. The assumption was that this is
>> executed synchronously and the queues and NIC will see a mapping of a
>> different memory area than the ciphertext in the pre-allocated page.
>>
>>> - The real benefit of kernel QUIC will come from HW offload. Would it
>>> be better to avoid the complexity of an in-kernel software
>>> implementation and only focus on HW offload? Basically, pass the
>>> plaintext QUIC packets over a standard UDP socket and alongside in a
>>> cmsg pass either an index into a HW security association database or
>>> the immediate { key, iv } connection_info (for stateless sockets), to
>>> be encoded into the descriptor by the device driver.
>> Hardware usually targets a single ciphersuite such as AES-GCM-128/256,
>> while QUIC also supports Chacha20-Poly1305 and AES-CCM. The generalized
>> support for offload prompted implementation of these ciphers in kernel
>> code.
> All userspace libraries also support all protocols as fall-back. No
> need for two fall-backs if HW support is missing?
Could be done that way. Looking at TLS - it has fallback to kernel 
though in tls_sw, although that is a dual purpose code.
>> The kernel code could also engage if the future hardware has
>> capacity caps preventing it from handling all requests in the hardware.
>>> - With such a simpler path, could we avoid introducing ULP and just
>>> have udp [gs]etsockopt CRYPTO_STATE. Where QUIC is the only defined
>>> state type yet.
>>>
>>> - Small aside: as the series introduces new APIs with non-trivial
>>> parsing in the kernel, it's good to run a fuzzer like syzkaller on it
>>> (if not having done so yet).
>> Agreed.
>>>> The other possible obstacle would be that eventual support
>>>> of QUIC encryption and decryption in hardware would integrate well with
>>>> this current approach.
>>>>> Didn't really understand the "GSO" you mentioned, as I don't see any
>>>>> code about kernel GSO, I guess it's just "Fragment size", right?
>>>>> BTW, it‘s not common to use "//" for the kernel annotation.
>>> minor point: fragment has meaning in IPv4. For GSO, prefer gso_size.
>> Sure, will change it to gso_size.
>>>> Once the payload arrives into the kernel, the GSO on the interface would
>>>> instruct L3/L4 stack on fragmentation. In this case, the plaintext QUIC
>>>> packets should be aligned on the GSO marks less the tag size that would
>>>> be added by encryption. For GSO size 1000, the QUIC packets in the batch
>>>> for transmission should all be 984 bytes long, except maybe the last
>>>> one. Once the tag is attached, the new size of 1000 will correctly split
>>>> the QUIC packets further down the stack for transmission in individual
>>>> IP/UDP packets. The code is also saving processing time by sending all
>>>> packets at once to UDP in a single call, when GSO is enabled.
>>>>> I'm not sure if it's worth adding a ULP layer over UDP for this QUIC
>>>>> TX only. Honestly, I'm more supporting doing a full QUIC stack in the
>>>>> kernel independently with socket APIs to use it:
>>>>> https://github.com/lxin/tls_hs.
>>>>>
>>>>> Thanks.

^ permalink raw reply	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2022-09-27 17:28 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <Adel Abouchaev <adel.abushaev@gmail.com>
2022-08-01 19:52 ` [RFC net-next 0/6] net: support QUIC crypto Adel Abouchaev
2022-08-01 19:52   ` [RFC net-next 1/6] net: Documentation on QUIC kernel Tx crypto Adel Abouchaev
2022-08-01 19:52   ` [RFC net-next 2/6] net: Define QUIC specific constants, control and data plane structures Adel Abouchaev
2022-08-01 19:52   ` [RFC net-next 3/6] net: Add UDP ULP operations, initialization and handling prototype functions Adel Abouchaev
2022-08-01 19:52   ` [RFC net-next 4/6] net: Implement QUIC offload functions Adel Abouchaev
2022-08-01 19:52   ` [RFC net-next 5/6] net: Add flow counters and Tx processing error counter Adel Abouchaev
2022-08-01 19:52   ` [RFC net-next 6/6] net: Add self tests for ULP operations, flow setup and crypto tests Adel Abouchaev
2022-08-05  3:37   ` [RFC net-next 0/6] net: support QUIC crypto Bagas Sanjaya
2022-08-03 16:40 ` Adel Abouchaev
2022-08-03 16:40   ` [RFC net-next 1/6] net: Documentation on QUIC kernel Tx crypto Adel Abouchaev
2022-08-03 18:23     ` Andrew Lunn
2022-08-03 18:51       ` Adel Abouchaev
2022-08-04 15:29         ` Andrew Lunn
2022-08-04 16:57           ` Adel Abouchaev
2022-08-04 17:00             ` Eric Dumazet
2022-08-04 18:09               ` Jakub Kicinski
2022-08-04 18:45                 ` Eric Dumazet
2022-08-04 13:57     ` Jonathan Corbet
2022-08-03 16:40   ` [RFC net-next 2/6] net: Define QUIC specific constants, control and data plane structures Adel Abouchaev
2022-08-03 16:40   ` [RFC net-next 3/6] net: Add UDP ULP operations, initialization and handling prototype functions Adel Abouchaev
2022-08-03 16:40   ` [RFC net-next 4/6] net: Implement QUIC offload functions Adel Abouchaev
2022-08-03 16:40   ` [RFC net-next 5/6] net: Add flow counters and Tx processing error counter Adel Abouchaev
2022-08-03 16:40   ` [RFC net-next 6/6] net: Add self tests for ULP operations, flow setup and crypto tests Adel Abouchaev
2022-08-06  0:11 ` [RFC net-next v2 0/6] net: support QUIC crypto Adel Abouchaev
2022-08-06  0:11   ` [RFC net-next v2 1/6] Documentation on QUIC kernel Tx crypto Adel Abouchaev
2022-08-06  3:05     ` Bagas Sanjaya
2022-08-08 19:05       ` Adel Abouchaev
2022-08-06  0:11   ` [RFC net-next v2 2/6] Define QUIC specific constants, control and data plane structures Adel Abouchaev
2022-08-06  0:11   ` [RFC net-next v2 3/6] Add UDP ULP operations, initialization and handling prototype functions Adel Abouchaev
2022-08-06  0:11   ` [RFC net-next v2 4/6] Implement QUIC offload functions Adel Abouchaev
2022-08-06  0:11   ` [RFC net-next v2 5/6] Add flow counters and Tx processing error counter Adel Abouchaev
2022-08-06  0:11   ` [RFC net-next v2 6/6] Add self tests for ULP operations, flow setup and crypto tests Adel Abouchaev
2022-08-16 18:11 ` [net-next 0/6] net: support QUIC crypto Adel Abouchaev
2022-08-16 18:11   ` [net-next 1/6] Documentation on QUIC kernel Tx crypto Adel Abouchaev
2022-08-16 18:11   ` [net-next 2/6] Define QUIC specific constants, control and data plane structures Adel Abouchaev
2022-08-16 18:11   ` [net-next 3/6] Add UDP ULP operations, initialization and handling prototype functions Adel Abouchaev
2022-08-16 18:11   ` [net-next 4/6] Implement QUIC offload functions Adel Abouchaev
2022-08-16 18:11   ` [net-next 5/6] Add flow counters and Tx processing error counter Adel Abouchaev
2022-08-16 18:11   ` [net-next 6/6] Add self tests for ULP operations, flow setup and crypto tests Adel Abouchaev
2022-08-17  8:09   ` [net-next 0/6] net: support QUIC crypto Bagas Sanjaya
2022-08-17 18:49     ` Adel Abouchaev
2022-08-17 20:09 ` [net-next v2 " Adel Abouchaev
2022-08-17 20:09   ` [net-next v2 1/6] Documentation on QUIC kernel Tx crypto Adel Abouchaev
2022-08-18  2:53     ` Bagas Sanjaya
2022-08-17 20:09   ` [net-next v2 2/6] Define QUIC specific constants, control and data plane structures Adel Abouchaev
2022-08-17 20:09   ` [net-next v2 3/6] Add UDP ULP operations, initialization and handling prototype functions Adel Abouchaev
2022-08-17 20:09   ` [net-next v2 4/6] Implement QUIC offload functions Adel Abouchaev
2022-08-17 20:09   ` [net-next v2 5/6] Add flow counters and Tx processing error counter Adel Abouchaev
2022-08-17 20:09   ` [net-next v2 6/6] Add self tests for ULP operations, flow setup and crypto tests Adel Abouchaev
2022-08-18  2:18   ` [net-next v2 0/6] net: support QUIC crypto Bagas Sanjaya
2022-08-24 18:29   ` Xin Long
2022-08-24 19:52     ` Matt Joras
2022-08-24 23:09     ` Adel Abouchaev
2022-09-25 18:04       ` Willem de Bruijn
2022-09-27 16:44         ` Adel Abouchaev
2022-09-27 17:12           ` Willem de Bruijn
2022-09-27 17:28             ` Adel Abouchaev
2022-08-24 18:43 ` [net-next] Fix reinitialization of TEST_PROGS in net self tests Adel Abouchaev
2022-08-24 20:12   ` Shuah Khan
2022-08-25 20:30   ` patchwork-bot+netdevbpf
2022-09-07  0:49 ` [net-next v3 0/6] net: support QUIC crypto Adel Abouchaev
2022-09-07  0:49   ` [net-next v3 1/6] net: Documentation on QUIC kernel Tx crypto Adel Abouchaev
2022-09-07  3:38     ` Bagas Sanjaya
2022-09-07 17:29       ` Adel Abouchaev
2022-09-07  0:49   ` [net-next v3 2/6] net: Define QUIC specific constants, control and data plane structures Adel Abouchaev
2022-09-07  0:49   ` [net-next v3 3/6] net: Add UDP ULP operations, initialization and handling prototype functions Adel Abouchaev
2022-09-07  0:49   ` [net-next v3 4/6] net: Implement QUIC offload functions Adel Abouchaev
2022-09-07  0:49   ` [net-next v3 5/6] net: Add flow counters and Tx processing error counter Adel Abouchaev
2022-09-07  0:49   ` [net-next v3 6/6] net: Add self tests for ULP operations, flow setup and crypto tests Adel Abouchaev
2022-09-09  0:12 ` [net-next v4 0/6] net: support QUIC crypto Adel Abouchaev
2022-09-09  0:12   ` [net-next v4 1/6] net: Documentation on QUIC kernel Tx crypto Adel Abouchaev
2022-09-09  1:40     ` Bagas Sanjaya
2022-09-09  0:12   ` [net-next v4 2/6] net: Define QUIC specific constants, control and data plane structures Adel Abouchaev
2022-09-09  0:12   ` [net-next v4 3/6] net: Add UDP ULP operations, initialization and handling prototype functions Adel Abouchaev
2022-09-09  0:12   ` [net-next v4 4/6] net: Implement QUIC offload functions Adel Abouchaev
2022-09-09  0:12   ` [net-next v4 5/6] net: Add flow counters and Tx processing error counter Adel Abouchaev
2022-09-09  0:12   ` [net-next v4 6/6] net: Add self tests for ULP operations, flow setup and crypto tests Adel Abouchaev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.