All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/14] NFS: add AF_VSOCK support
@ 2017-06-30 13:23 Stefan Hajnoczi
  2017-06-30 13:23 ` [PATCH v3 01/14] SUNRPC: add AF_VSOCK support to addr.[ch] Stefan Hajnoczi
                   ` (13 more replies)
  0 siblings, 14 replies; 35+ messages in thread
From: Stefan Hajnoczi @ 2017-06-30 13:23 UTC (permalink / raw)
  To: linux-nfs
  Cc: Abbas Naderi, Anna Schumaker, Trond Myklebust, J. Bruce Fields,
	Jeff Layton, Chuck Lever, Stefan Hajnoczi

v3:
 * Now with nfsd support so the full stack can be tested

This patch series enables AF_VSOCK address family support in the NFS client and
nfsd.  You can also get the code here:
https://github.com/stefanha/linux/tree/vsock-nfsd

Please also see the nfs-utils patch series I have just sent to
linux-nfs@vger.kernel.org for the necessary patches.  You can get the code
here:
https://github.com/stefanha/nfs-utils/tree/vsock-nfsd

The AF_VSOCK address family provides socket communication between virtual
machines and hypervisors.  VMware VMCI and virtio (for KVM) transports are
available in Linux, see net/vmw_vsock/.

The goal of this work is sharing files between virtual machines and
hypervisors.  AF_VSOCK is well-suited to this because it requires no
configuration inside the virtual machine, making it simple to manage and
reliable.

Why NFS over AF_VSOCK?
----------------------
It is unusual to add a new NFS transport, only TCP, RDMA, and UDP are currently
supported.  Here is the rationale for adding AF_VSOCK.

Sharing files with a virtual machine can be configured manually:
1. Add a dedicated network card to the virtual machine.  It will be used for
   NFS traffic.
2. Configure a local subnet and assign IP addresses to the virtual machine and
   hypervisor
3. Configure an NFS export on the hypervisor and start the NFS server
4. Mount the export inside the virtual machine

Automating these steps poses a problem: modifying network configuration inside
the virtual machine is invasive.  It's hard to add a network interface to an
arbitrary running system in an automated fashion, considering the diversity in
network management tools, firewall rules, IP address usage, etc.

Furthermore, the user may disrupt file sharing by accident when they add
firewall rules, restart networking, etc because the NFS network interface is
prone to interference alongside the network interfaces managed by the user.

AF_VSOCK is a zero-configuration network transport that avoids these problems.
Adding it to a virtual machine is non-invasive.  It also avoids accidental
misconfiguration by the user.  This is why "guest agents" and other services in
various hypervisors (KVM, Xen, VMware, VirtualBox) do not use regular network
interfaces.

Instead of implementing a paravirtualized filesystem it makes more sense to use
NFS, which is mature and well-understood.  This is why this patch series adds
AF_VSOCK support to NFS.

The approach in this series
---------------------------
AF_VSOCK stream sockets can be used for NFSv4.1 much in the same way as TCP.
RFC 1831 record fragments divide messages since SOCK_STREAM semantics are
present.  The backchannel shares the connection just like the default TCP
configuration.

Addresses are <Context ID, Port Number> pairs.  These patches use "vsock:<cid>"
string representation to distinguish AF_VSOCK addresses from IPv4 and IPv6
numeric addresses.

The following nfsd /proc changes are needed:

 * /proc/net/rpc/auth.unix.ip - new 'vsock:CID' syntax
 * /proc/fs/nfsd/portlist - new 'vsock' transport and
                            accept AF_VSOCK socket fds

Quickstart
----------
1. Build these patches or clone from git:
   https://github.com/stefanha/linux/tree/vsock-nfsd

   Config options:
   CONFIG_VSOCKETS=m
   CONFIG_VIRTIO_VSOCKETS=m
   CONFIG_VIRTIO_VSOCKETS_COMMON=m
   CONFIG_SUNRPC_XPRT_VSOCK=y
   CONFIG_VHOST_VSOCK=m

   Install this kernel on the host and inside the guest.

2. Build nfs-utils from git:
   https://github.com/stefanha/nfs-utils/tree/vsock-nfsd

   Install nfs-utils on the host and inside the guest.

3. Define a vsock export on the host:

   (host)# cat /etc/exports
   /export	vsock:*(rw,no_root_squash,insecure,subtree_check)

4. Ensure the host has AF_VSOCK set up

   (host)# modprobe vhost_vsock

5. Start nfsd

   (host)# systemctl start var-lib-nfs-rpc_pipefs.mount
   (host)# systemctl start proc-fs-nfsd.mount
   (host)# systemctl start rpcbind.socket rpcbind.service
   (host)# rpc.mountd
   (host)# exportfs -r
   (host)# rpc.nfsd -N3 -V4.1 --vsock 2049

6. Launch the guest

   (host)# qemu-system-x86_64 -M accel=kvm -m 1G \
             -device vhost-vsock-pci,id=vhost-vsock-pci0,guest-cid=3 \
             ...

   (Check whether your qemu-system-x86_64 binary supports vsock using
   "qemu-system-x86_64 -device \? 2>&1 | grep vsock".  If not, build
   QEMU from git://git.qemu-project.org/qemu.git master.)

7. Mount the export from the guest

   The following example mounts /export from the hypervisor (CID 2)
   inside the virtual machine (CID 3):

   (guest)# mount.nfs 2:/export /mnt -o clientaddr=3,proto=vsock

Status
------
Tested with basic NFSv4.1 file I/O.  Advanced NFS features may require
additional changes.

Please let me know your comments or questions.

Thanks,
Stefan

Stefan Hajnoczi (14):
  SUNRPC: add AF_VSOCK support to addr.[ch]
  SUNRPC: rename "TCP" record parser to "stream" parser
  SUNRPC: abstract tcp_read_sock() in record fragment parser
  SUNRPC: extract xs_stream_reset_state()
  VSOCK: add tcp_read_sock()-like vsock_read_sock() function
  SUNRPC: add AF_VSOCK support to xprtsock.c
  SUNRPC: drop unnecessary svc_bc_tcp_create() helper
  SUNRPC: add AF_VSOCK support to svc_xprt.c
  SUNRPC: add AF_VSOCK backchannel support
  NFS: add AF_VSOCK support to NFS client
  nfsd: support vsock xprt creation
  SUNRPC: add AF_VSOCK lock class
  SUNRPC: vsock svcsock support
  SUNRPC: add AF_VSOCK support to auth.unix.ip

 include/linux/sunrpc/addr.h             |  44 ++
 include/linux/sunrpc/svc_xprt.h         |  12 +
 include/linux/sunrpc/xprt.h             |   1 +
 include/linux/sunrpc/xprtsock.h         |  36 +-
 include/linux/virtio_vsock.h            |   4 +
 include/net/af_vsock.h                  |   5 +
 include/trace/events/sunrpc.h           |  26 +-
 drivers/vhost/vsock.c                   |   1 +
 fs/nfs/client.c                         |   2 +
 fs/nfs/super.c                          |  11 +-
 fs/nfsd/nfsctl.c                        |  23 +-
 net/sunrpc/addr.c                       |  57 +++
 net/sunrpc/svc_xprt.c                   |  18 +
 net/sunrpc/svcauth_unix.c               | 146 +++++--
 net/sunrpc/svcsock.c                    | 271 ++++++++++--
 net/sunrpc/xprtsock.c                   | 701 +++++++++++++++++++++++++-------
 net/vmw_vsock/af_vsock.c                |  16 +
 net/vmw_vsock/virtio_transport.c        |   1 +
 net/vmw_vsock/virtio_transport_common.c |  66 +++
 net/vmw_vsock/vmci_transport.c          |   8 +
 net/sunrpc/Kconfig                      |  10 +
 21 files changed, 1206 insertions(+), 253 deletions(-)

-- 
2.9.4


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 01/14] SUNRPC: add AF_VSOCK support to addr.[ch]
  2017-06-30 13:23 [PATCH v3 00/14] NFS: add AF_VSOCK support Stefan Hajnoczi
@ 2017-06-30 13:23 ` Stefan Hajnoczi
  2017-06-30 13:23 ` [PATCH v3 02/14] SUNRPC: rename "TCP" record parser to "stream" parser Stefan Hajnoczi
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 35+ messages in thread
From: Stefan Hajnoczi @ 2017-06-30 13:23 UTC (permalink / raw)
  To: linux-nfs
  Cc: Abbas Naderi, Anna Schumaker, Trond Myklebust, J. Bruce Fields,
	Jeff Layton, Chuck Lever, Stefan Hajnoczi

AF_VSOCK addresses are a Context ID (CID) and port number tuple.  The
CID is a unique address, similar to a IP address on a local subnet.

Extend the addr.h functions to handle AF_VSOCK addresses.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
v2:
 * Replace CONFIG_VSOCKETS with CONFIG_SUNRPC_XPRT_VSOCK to prevent
   build failures when SUNRPC=y and VSOCKETS=m.  Built-in code cannot
   link against code in a module.
---
 include/linux/sunrpc/addr.h | 44 ++++++++++++++++++++++++++++++++++
 net/sunrpc/addr.c           | 57 +++++++++++++++++++++++++++++++++++++++++++++
 net/sunrpc/Kconfig          | 10 ++++++++
 3 files changed, 111 insertions(+)

diff --git a/include/linux/sunrpc/addr.h b/include/linux/sunrpc/addr.h
index 5c9c6cd..c4169bc 100644
--- a/include/linux/sunrpc/addr.h
+++ b/include/linux/sunrpc/addr.h
@@ -10,6 +10,7 @@
 #include <linux/socket.h>
 #include <linux/in.h>
 #include <linux/in6.h>
+#include <linux/vm_sockets.h>
 #include <net/ipv6.h>
 
 size_t		rpc_ntop(const struct sockaddr *, char *, const size_t);
@@ -26,6 +27,8 @@ static inline unsigned short rpc_get_port(const struct sockaddr *sap)
 		return ntohs(((struct sockaddr_in *)sap)->sin_port);
 	case AF_INET6:
 		return ntohs(((struct sockaddr_in6 *)sap)->sin6_port);
+	case AF_VSOCK:
+		return ((struct sockaddr_vm *)sap)->svm_port;
 	}
 	return 0;
 }
@@ -40,6 +43,9 @@ static inline void rpc_set_port(struct sockaddr *sap,
 	case AF_INET6:
 		((struct sockaddr_in6 *)sap)->sin6_port = htons(port);
 		break;
+	case AF_VSOCK:
+		((struct sockaddr_vm *)sap)->svm_port = port;
+		break;
 	}
 }
 
@@ -106,6 +112,40 @@ static inline bool __rpc_copy_addr6(struct sockaddr *dst,
 }
 #endif	/* !(IS_ENABLED(CONFIG_IPV6) */
 
+#if IS_ENABLED(CONFIG_VSOCKETS)
+static inline bool rpc_cmp_vsock_addr(const struct sockaddr *sap1,
+				      const struct sockaddr *sap2)
+{
+	const struct sockaddr_vm *svm1 = (const struct sockaddr_vm *)sap1;
+	const struct sockaddr_vm *svm2 = (const struct sockaddr_vm *)sap2;
+
+	return svm1->svm_cid == svm2->svm_cid;
+}
+
+static inline bool __rpc_copy_vsock_addr(struct sockaddr *dst,
+					 const struct sockaddr *src)
+{
+	const struct sockaddr_vm *ssvm = (const struct sockaddr_vm *)src;
+	struct sockaddr_vm *dsvm = (struct sockaddr_vm *)dst;
+
+	dsvm->svm_family = ssvm->svm_family;
+	dsvm->svm_cid = ssvm->svm_cid;
+	return true;
+}
+#else	/* !(IS_ENABLED(CONFIG_VSOCKETS) */
+static inline bool rpc_cmp_vsock_addr(const struct sockaddr *sap1,
+				      const struct sockaddr *sap2)
+{
+	return false;
+}
+
+static inline bool __rpc_copy_vsock_addr(struct sockaddr *dst,
+					 const struct sockaddr *src)
+{
+	return false;
+}
+#endif	/* !(IS_ENABLED(CONFIG_VSOCKETS) */
+
 /**
  * rpc_cmp_addr - compare the address portion of two sockaddrs.
  * @sap1: first sockaddr
@@ -125,6 +165,8 @@ static inline bool rpc_cmp_addr(const struct sockaddr *sap1,
 			return rpc_cmp_addr4(sap1, sap2);
 		case AF_INET6:
 			return rpc_cmp_addr6(sap1, sap2);
+		case AF_VSOCK:
+			return rpc_cmp_vsock_addr(sap1, sap2);
 		}
 	}
 	return false;
@@ -161,6 +203,8 @@ static inline bool rpc_copy_addr(struct sockaddr *dst,
 		return __rpc_copy_addr4(dst, src);
 	case AF_INET6:
 		return __rpc_copy_addr6(dst, src);
+	case AF_VSOCK:
+		return __rpc_copy_vsock_addr(dst, src);
 	}
 	return false;
 }
diff --git a/net/sunrpc/addr.c b/net/sunrpc/addr.c
index 2e0a6f9..f4dd962 100644
--- a/net/sunrpc/addr.c
+++ b/net/sunrpc/addr.c
@@ -16,11 +16,14 @@
  * RFC 4291, Section 2.2 for details on IPv6 presentation formats.
  */
 
+ /* TODO register netid and uaddr with IANA? (See RFC 5665 5.1/5.2) */
+
 #include <net/ipv6.h>
 #include <linux/sunrpc/addr.h>
 #include <linux/sunrpc/msg_prot.h>
 #include <linux/slab.h>
 #include <linux/export.h>
+#include <linux/vm_sockets.h>
 
 #if IS_ENABLED(CONFIG_IPV6)
 
@@ -108,6 +111,26 @@ static size_t rpc_ntop6(const struct sockaddr *sap,
 
 #endif	/* !IS_ENABLED(CONFIG_IPV6) */
 
+#ifdef CONFIG_SUNRPC_XPRT_VSOCK
+
+static size_t rpc_ntop_vsock(const struct sockaddr *sap,
+			     char *buf, const size_t buflen)
+{
+	const struct sockaddr_vm *svm = (struct sockaddr_vm *)sap;
+
+	return snprintf(buf, buflen, "%u", svm->svm_cid);
+}
+
+#else	/* !CONFIG_SUNRPC_XPRT_VSOCK */
+
+static size_t rpc_ntop_vsock(const struct sockaddr *sap,
+			     char *buf, const size_t buflen)
+{
+	return 0;
+}
+
+#endif	/* !CONFIG_SUNRPC_XPRT_VSOCK */
+
 static int rpc_ntop4(const struct sockaddr *sap,
 		     char *buf, const size_t buflen)
 {
@@ -132,6 +155,8 @@ size_t rpc_ntop(const struct sockaddr *sap, char *buf, const size_t buflen)
 		return rpc_ntop4(sap, buf, buflen);
 	case AF_INET6:
 		return rpc_ntop6(sap, buf, buflen);
+	case AF_VSOCK:
+		return rpc_ntop_vsock(sap, buf, buflen);
 	}
 
 	return 0;
@@ -229,6 +254,34 @@ static size_t rpc_pton6(struct net *net, const char *buf, const size_t buflen,
 }
 #endif
 
+#ifdef CONFIG_SUNRPC_XPRT_VSOCK
+static size_t rpc_pton_vsock(const char *buf, const size_t buflen,
+			     struct sockaddr *sap, const size_t salen)
+{
+	const size_t prefix_len = strlen("vsock:");
+	struct sockaddr_vm *svm = (struct sockaddr_vm *)sap;
+	unsigned int cid;
+
+	if (strncmp(buf, "vsock:", prefix_len) != 0 ||
+	    salen < sizeof(struct sockaddr_vm))
+		return 0;
+
+	if (kstrtouint(buf + prefix_len, 10, &cid) != 0)
+		return 0;
+
+	memset(svm, 0, sizeof(struct sockaddr_vm));
+	svm->svm_family = AF_VSOCK;
+	svm->svm_cid = cid;
+	return sizeof(struct sockaddr_vm);
+}
+#else
+static size_t rpc_pton_vsock(const char *buf, const size_t buflen,
+			     struct sockaddr *sap, const size_t salen)
+{
+	return 0;
+}
+#endif
+
 /**
  * rpc_pton - Construct a sockaddr in @sap
  * @net: applicable network namespace
@@ -249,6 +302,10 @@ size_t rpc_pton(struct net *net, const char *buf, const size_t buflen,
 {
 	unsigned int i;
 
+	/* TODO is there a nicer way to distinguish vsock addresses? */
+	if (strncmp(buf, "vsock:", 6) == 0)
+		return rpc_pton_vsock(buf, buflen, sap, salen);
+
 	for (i = 0; i < buflen; i++)
 		if (buf[i] == ':')
 			return rpc_pton6(net, buf, buflen, sap, salen);
diff --git a/net/sunrpc/Kconfig b/net/sunrpc/Kconfig
index ac09ca8..4fc6550 100644
--- a/net/sunrpc/Kconfig
+++ b/net/sunrpc/Kconfig
@@ -62,3 +62,13 @@ config SUNRPC_XPRT_RDMA
 
 	  If unsure, or you know there is no RDMA capability on your
 	  hardware platform, say N.
+
+config SUNRPC_XPRT_VSOCK
+	bool "RPC-over-AF_VSOCK transport"
+	depends on SUNRPC && VSOCKETS && !(SUNRPC=y && VSOCKETS=m)
+	default SUNRPC && VSOCKETS
+	help
+	  This option allows the NFS client and server to use the AF_VSOCK
+	  transport to communicate between virtual machines and the host.
+
+	  If unsure, say Y.
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 02/14] SUNRPC: rename "TCP" record parser to "stream" parser
  2017-06-30 13:23 [PATCH v3 00/14] NFS: add AF_VSOCK support Stefan Hajnoczi
  2017-06-30 13:23 ` [PATCH v3 01/14] SUNRPC: add AF_VSOCK support to addr.[ch] Stefan Hajnoczi
@ 2017-06-30 13:23 ` Stefan Hajnoczi
  2017-06-30 13:23 ` [PATCH v3 03/14] SUNRPC: abstract tcp_read_sock() in record fragment parser Stefan Hajnoczi
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 35+ messages in thread
From: Stefan Hajnoczi @ 2017-06-30 13:23 UTC (permalink / raw)
  To: linux-nfs
  Cc: Abbas Naderi, Anna Schumaker, Trond Myklebust, J. Bruce Fields,
	Jeff Layton, Chuck Lever, Stefan Hajnoczi

The TCP record parser is really a RFC 1831 record fragment parser.
There is nothing TCP protocol-specific about parsing record fragments.
The parser can be reused for any SOCK_STREAM socket.

This patch renames functions and fields but xs_stream_data_ready() still
calls tcp_read_sock().  This is addressed in the next patch.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/linux/sunrpc/xprtsock.h |  31 ++---
 include/trace/events/sunrpc.h   |  26 ++--
 net/sunrpc/xprtsock.c           | 262 ++++++++++++++++++++--------------------
 3 files changed, 160 insertions(+), 159 deletions(-)

diff --git a/include/linux/sunrpc/xprtsock.h b/include/linux/sunrpc/xprtsock.h
index c9959d7..6749774 100644
--- a/include/linux/sunrpc/xprtsock.h
+++ b/include/linux/sunrpc/xprtsock.h
@@ -27,17 +27,18 @@ struct sock_xprt {
 	struct sock *		inet;
 
 	/*
-	 * State of TCP reply receive
+	 * State of SOCK_STREAM reply receive
 	 */
-	__be32			tcp_fraghdr,
-				tcp_xid,
-				tcp_calldir;
+	__be32			stream_fraghdr,
+				stream_xid,
+				stream_calldir;
 
-	u32			tcp_offset,
-				tcp_reclen;
+	u32			stream_offset,
+				stream_reclen;
+
+	unsigned long		stream_copied,
+				stream_flags;
 
-	unsigned long		tcp_copied,
-				tcp_flags;
 
 	/*
 	 * Connection of transports
@@ -69,17 +70,17 @@ struct sock_xprt {
 /*
  * TCP receive state flags
  */
-#define TCP_RCV_LAST_FRAG	(1UL << 0)
-#define TCP_RCV_COPY_FRAGHDR	(1UL << 1)
-#define TCP_RCV_COPY_XID	(1UL << 2)
-#define TCP_RCV_COPY_DATA	(1UL << 3)
-#define TCP_RCV_READ_CALLDIR	(1UL << 4)
-#define TCP_RCV_COPY_CALLDIR	(1UL << 5)
+#define STREAM_RCV_LAST_FRAG	(1UL << 0)
+#define STREAM_RCV_COPY_FRAGHDR	(1UL << 1)
+#define STREAM_RCV_COPY_XID	(1UL << 2)
+#define STREAM_RCV_COPY_DATA	(1UL << 3)
+#define STREAM_RCV_READ_CALLDIR	(1UL << 4)
+#define STREAM_RCV_COPY_CALLDIR	(1UL << 5)
 
 /*
  * TCP RPC flags
  */
-#define TCP_RPC_REPLY		(1UL << 6)
+#define STREAM_RPC_REPLY	(1UL << 6)
 
 #define XPRT_SOCK_CONNECTING	1U
 #define XPRT_SOCK_DATA_READY	(2)
diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h
index 8a707f8..916a30b 100644
--- a/include/trace/events/sunrpc.h
+++ b/include/trace/events/sunrpc.h
@@ -400,15 +400,15 @@ TRACE_EVENT(xs_tcp_data_ready,
 
 #define rpc_show_sock_xprt_flags(flags) \
 	__print_flags(flags, "|", \
-		{ TCP_RCV_LAST_FRAG, "TCP_RCV_LAST_FRAG" }, \
-		{ TCP_RCV_COPY_FRAGHDR, "TCP_RCV_COPY_FRAGHDR" }, \
-		{ TCP_RCV_COPY_XID, "TCP_RCV_COPY_XID" }, \
-		{ TCP_RCV_COPY_DATA, "TCP_RCV_COPY_DATA" }, \
-		{ TCP_RCV_READ_CALLDIR, "TCP_RCV_READ_CALLDIR" }, \
-		{ TCP_RCV_COPY_CALLDIR, "TCP_RCV_COPY_CALLDIR" }, \
-		{ TCP_RPC_REPLY, "TCP_RPC_REPLY" })
+		{ STREAM_RCV_LAST_FRAG, "STREAM_RCV_LAST_FRAG" }, \
+		{ STREAM_RCV_COPY_FRAGHDR, "STREAM_RCV_COPY_FRAGHDR" }, \
+		{ STREAM_RCV_COPY_XID, "STREAM_RCV_COPY_XID" }, \
+		{ STREAM_RCV_COPY_DATA, "STREAM_RCV_COPY_DATA" }, \
+		{ STREAM_RCV_READ_CALLDIR, "STREAM_RCV_READ_CALLDIR" }, \
+		{ STREAM_RCV_COPY_CALLDIR, "STREAM_RCV_COPY_CALLDIR" }, \
+		{ STREAM_RPC_REPLY, "STREAM_RPC_REPLY" })
 
-TRACE_EVENT(xs_tcp_data_recv,
+TRACE_EVENT(xs_stream_data_recv,
 	TP_PROTO(struct sock_xprt *xs),
 
 	TP_ARGS(xs),
@@ -426,11 +426,11 @@ TRACE_EVENT(xs_tcp_data_recv,
 	TP_fast_assign(
 		__assign_str(addr, xs->xprt.address_strings[RPC_DISPLAY_ADDR]);
 		__assign_str(port, xs->xprt.address_strings[RPC_DISPLAY_PORT]);
-		__entry->xid = xs->tcp_xid;
-		__entry->flags = xs->tcp_flags;
-		__entry->copied = xs->tcp_copied;
-		__entry->reclen = xs->tcp_reclen;
-		__entry->offset = xs->tcp_offset;
+		__entry->xid = xs->stream_xid;
+		__entry->flags = xs->stream_flags;
+		__entry->copied = xs->stream_copied;
+		__entry->reclen = xs->stream_reclen;
+		__entry->offset = xs->stream_offset;
 	),
 
 	TP_printk("peer=[%s]:%s xid=0x%x flags=%s copied=%lu reclen=%u offset=%lu",
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index d5b54c0..5c21aec 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1141,119 +1141,119 @@ static void xs_tcp_force_close(struct rpc_xprt *xprt)
 	xprt_force_disconnect(xprt);
 }
 
-static inline void xs_tcp_read_fraghdr(struct rpc_xprt *xprt, struct xdr_skb_reader *desc)
+static inline void xs_stream_read_fraghdr(struct rpc_xprt *xprt, struct xdr_skb_reader *desc)
 {
 	struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
 	size_t len, used;
 	char *p;
 
-	p = ((char *) &transport->tcp_fraghdr) + transport->tcp_offset;
-	len = sizeof(transport->tcp_fraghdr) - transport->tcp_offset;
+	p = ((char *) &transport->stream_fraghdr) + transport->stream_offset;
+	len = sizeof(transport->stream_fraghdr) - transport->stream_offset;
 	used = xdr_skb_read_bits(desc, p, len);
-	transport->tcp_offset += used;
+	transport->stream_offset += used;
 	if (used != len)
 		return;
 
-	transport->tcp_reclen = ntohl(transport->tcp_fraghdr);
-	if (transport->tcp_reclen & RPC_LAST_STREAM_FRAGMENT)
-		transport->tcp_flags |= TCP_RCV_LAST_FRAG;
+	transport->stream_reclen = ntohl(transport->stream_fraghdr);
+	if (transport->stream_reclen & RPC_LAST_STREAM_FRAGMENT)
+		transport->stream_flags |= STREAM_RCV_LAST_FRAG;
 	else
-		transport->tcp_flags &= ~TCP_RCV_LAST_FRAG;
-	transport->tcp_reclen &= RPC_FRAGMENT_SIZE_MASK;
+		transport->stream_flags &= ~STREAM_RCV_LAST_FRAG;
+	transport->stream_reclen &= RPC_FRAGMENT_SIZE_MASK;
 
-	transport->tcp_flags &= ~TCP_RCV_COPY_FRAGHDR;
-	transport->tcp_offset = 0;
+	transport->stream_flags &= ~STREAM_RCV_COPY_FRAGHDR;
+	transport->stream_offset = 0;
 
 	/* Sanity check of the record length */
-	if (unlikely(transport->tcp_reclen < 8)) {
-		dprintk("RPC:       invalid TCP record fragment length\n");
+	if (unlikely(transport->stream_reclen < 8)) {
+		dprintk("RPC:       invalid record fragment length\n");
 		xs_tcp_force_close(xprt);
 		return;
 	}
-	dprintk("RPC:       reading TCP record fragment of length %d\n",
-			transport->tcp_reclen);
+	dprintk("RPC:       reading record fragment of length %d\n",
+			transport->stream_reclen);
 }
 
-static void xs_tcp_check_fraghdr(struct sock_xprt *transport)
+static void xs_stream_check_fraghdr(struct sock_xprt *transport)
 {
-	if (transport->tcp_offset == transport->tcp_reclen) {
-		transport->tcp_flags |= TCP_RCV_COPY_FRAGHDR;
-		transport->tcp_offset = 0;
-		if (transport->tcp_flags & TCP_RCV_LAST_FRAG) {
-			transport->tcp_flags &= ~TCP_RCV_COPY_DATA;
-			transport->tcp_flags |= TCP_RCV_COPY_XID;
-			transport->tcp_copied = 0;
+	if (transport->stream_offset == transport->stream_reclen) {
+		transport->stream_flags |= STREAM_RCV_COPY_FRAGHDR;
+		transport->stream_offset = 0;
+		if (transport->stream_flags & STREAM_RCV_LAST_FRAG) {
+			transport->stream_flags &= ~STREAM_RCV_COPY_DATA;
+			transport->stream_flags |= STREAM_RCV_COPY_XID;
+			transport->stream_copied = 0;
 		}
 	}
 }
 
-static inline void xs_tcp_read_xid(struct sock_xprt *transport, struct xdr_skb_reader *desc)
+static inline void xs_stream_read_xid(struct sock_xprt *transport, struct xdr_skb_reader *desc)
 {
 	size_t len, used;
 	char *p;
 
-	len = sizeof(transport->tcp_xid) - transport->tcp_offset;
+	len = sizeof(transport->stream_xid) - transport->stream_offset;
 	dprintk("RPC:       reading XID (%zu bytes)\n", len);
-	p = ((char *) &transport->tcp_xid) + transport->tcp_offset;
+	p = ((char *) &transport->stream_xid) + transport->stream_offset;
 	used = xdr_skb_read_bits(desc, p, len);
-	transport->tcp_offset += used;
+	transport->stream_offset += used;
 	if (used != len)
 		return;
-	transport->tcp_flags &= ~TCP_RCV_COPY_XID;
-	transport->tcp_flags |= TCP_RCV_READ_CALLDIR;
-	transport->tcp_copied = 4;
+	transport->stream_flags &= ~STREAM_RCV_COPY_XID;
+	transport->stream_flags |= STREAM_RCV_READ_CALLDIR;
+	transport->stream_copied = 4;
 	dprintk("RPC:       reading %s XID %08x\n",
-			(transport->tcp_flags & TCP_RPC_REPLY) ? "reply for"
+			(transport->stream_flags & STREAM_RPC_REPLY) ? "reply for"
 							      : "request with",
-			ntohl(transport->tcp_xid));
-	xs_tcp_check_fraghdr(transport);
+			ntohl(transport->stream_xid));
+	xs_stream_check_fraghdr(transport);
 }
 
-static inline void xs_tcp_read_calldir(struct sock_xprt *transport,
-				       struct xdr_skb_reader *desc)
+static inline void xs_stream_read_calldir(struct sock_xprt *transport,
+					  struct xdr_skb_reader *desc)
 {
 	size_t len, used;
 	u32 offset;
 	char *p;
 
 	/*
-	 * We want transport->tcp_offset to be 8 at the end of this routine
+	 * We want transport->stream_offset to be 8 at the end of this routine
 	 * (4 bytes for the xid and 4 bytes for the call/reply flag).
 	 * When this function is called for the first time,
-	 * transport->tcp_offset is 4 (after having already read the xid).
+	 * transport->stream_offset is 4 (after having already read the xid).
 	 */
-	offset = transport->tcp_offset - sizeof(transport->tcp_xid);
-	len = sizeof(transport->tcp_calldir) - offset;
+	offset = transport->stream_offset - sizeof(transport->stream_xid);
+	len = sizeof(transport->stream_calldir) - offset;
 	dprintk("RPC:       reading CALL/REPLY flag (%zu bytes)\n", len);
-	p = ((char *) &transport->tcp_calldir) + offset;
+	p = ((char *) &transport->stream_calldir) + offset;
 	used = xdr_skb_read_bits(desc, p, len);
-	transport->tcp_offset += used;
+	transport->stream_offset += used;
 	if (used != len)
 		return;
-	transport->tcp_flags &= ~TCP_RCV_READ_CALLDIR;
+	transport->stream_flags &= ~STREAM_RCV_READ_CALLDIR;
 	/*
 	 * We don't yet have the XDR buffer, so we will write the calldir
 	 * out after we get the buffer from the 'struct rpc_rqst'
 	 */
-	switch (ntohl(transport->tcp_calldir)) {
+	switch (ntohl(transport->stream_calldir)) {
 	case RPC_REPLY:
-		transport->tcp_flags |= TCP_RCV_COPY_CALLDIR;
-		transport->tcp_flags |= TCP_RCV_COPY_DATA;
-		transport->tcp_flags |= TCP_RPC_REPLY;
+		transport->stream_flags |= STREAM_RCV_COPY_CALLDIR;
+		transport->stream_flags |= STREAM_RCV_COPY_DATA;
+		transport->stream_flags |= STREAM_RPC_REPLY;
 		break;
 	case RPC_CALL:
-		transport->tcp_flags |= TCP_RCV_COPY_CALLDIR;
-		transport->tcp_flags |= TCP_RCV_COPY_DATA;
-		transport->tcp_flags &= ~TCP_RPC_REPLY;
+		transport->stream_flags |= STREAM_RCV_COPY_CALLDIR;
+		transport->stream_flags |= STREAM_RCV_COPY_DATA;
+		transport->stream_flags &= ~STREAM_RPC_REPLY;
 		break;
 	default:
 		dprintk("RPC:       invalid request message type\n");
 		xs_tcp_force_close(&transport->xprt);
 	}
-	xs_tcp_check_fraghdr(transport);
+	xs_stream_check_fraghdr(transport);
 }
 
-static inline void xs_tcp_read_common(struct rpc_xprt *xprt,
+static inline void xs_stream_read_common(struct rpc_xprt *xprt,
 				     struct xdr_skb_reader *desc,
 				     struct rpc_rqst *req)
 {
@@ -1265,97 +1265,97 @@ static inline void xs_tcp_read_common(struct rpc_xprt *xprt,
 
 	rcvbuf = &req->rq_private_buf;
 
-	if (transport->tcp_flags & TCP_RCV_COPY_CALLDIR) {
+	if (transport->stream_flags & STREAM_RCV_COPY_CALLDIR) {
 		/*
 		 * Save the RPC direction in the XDR buffer
 		 */
-		memcpy(rcvbuf->head[0].iov_base + transport->tcp_copied,
-			&transport->tcp_calldir,
-			sizeof(transport->tcp_calldir));
-		transport->tcp_copied += sizeof(transport->tcp_calldir);
-		transport->tcp_flags &= ~TCP_RCV_COPY_CALLDIR;
+		memcpy(rcvbuf->head[0].iov_base + transport->stream_copied,
+			&transport->stream_calldir,
+			sizeof(transport->stream_calldir));
+		transport->stream_copied += sizeof(transport->stream_calldir);
+		transport->stream_flags &= ~STREAM_RCV_COPY_CALLDIR;
 	}
 
 	len = desc->count;
-	if (len > transport->tcp_reclen - transport->tcp_offset) {
+	if (len > transport->stream_reclen - transport->stream_offset) {
 		struct xdr_skb_reader my_desc;
 
-		len = transport->tcp_reclen - transport->tcp_offset;
+		len = transport->stream_reclen - transport->stream_offset;
 		memcpy(&my_desc, desc, sizeof(my_desc));
 		my_desc.count = len;
-		r = xdr_partial_copy_from_skb(rcvbuf, transport->tcp_copied,
+		r = xdr_partial_copy_from_skb(rcvbuf, transport->stream_copied,
 					  &my_desc, xdr_skb_read_bits);
 		desc->count -= r;
 		desc->offset += r;
 	} else
-		r = xdr_partial_copy_from_skb(rcvbuf, transport->tcp_copied,
+		r = xdr_partial_copy_from_skb(rcvbuf, transport->stream_copied,
 					  desc, xdr_skb_read_bits);
 
 	if (r > 0) {
-		transport->tcp_copied += r;
-		transport->tcp_offset += r;
+		transport->stream_copied += r;
+		transport->stream_offset += r;
 	}
 	if (r != len) {
 		/* Error when copying to the receive buffer,
 		 * usually because we weren't able to allocate
 		 * additional buffer pages. All we can do now
-		 * is turn off TCP_RCV_COPY_DATA, so the request
+		 * is turn off STREAM_RCV_COPY_DATA, so the request
 		 * will not receive any additional updates,
 		 * and time out.
 		 * Any remaining data from this record will
 		 * be discarded.
 		 */
-		transport->tcp_flags &= ~TCP_RCV_COPY_DATA;
+		transport->stream_flags &= ~STREAM_RCV_COPY_DATA;
 		dprintk("RPC:       XID %08x truncated request\n",
-				ntohl(transport->tcp_xid));
-		dprintk("RPC:       xprt = %p, tcp_copied = %lu, "
-				"tcp_offset = %u, tcp_reclen = %u\n",
-				xprt, transport->tcp_copied,
-				transport->tcp_offset, transport->tcp_reclen);
+				ntohl(transport->stream_xid));
+		dprintk("RPC:       xprt = %p, stream_copied = %lu, "
+				"stream_offset = %u, stream_reclen = %u\n",
+				xprt, transport->stream_copied,
+				transport->stream_offset, transport->stream_reclen);
 		return;
 	}
 
 	dprintk("RPC:       XID %08x read %zd bytes\n",
-			ntohl(transport->tcp_xid), r);
-	dprintk("RPC:       xprt = %p, tcp_copied = %lu, tcp_offset = %u, "
-			"tcp_reclen = %u\n", xprt, transport->tcp_copied,
-			transport->tcp_offset, transport->tcp_reclen);
+			ntohl(transport->stream_xid), r);
+	dprintk("RPC:       xprt = %p, stream_copied = %lu, stream_offset = %u, "
+			"stream_reclen = %u\n", xprt, transport->stream_copied,
+			transport->stream_offset, transport->stream_reclen);
 
-	if (transport->tcp_copied == req->rq_private_buf.buflen)
-		transport->tcp_flags &= ~TCP_RCV_COPY_DATA;
-	else if (transport->tcp_offset == transport->tcp_reclen) {
-		if (transport->tcp_flags & TCP_RCV_LAST_FRAG)
-			transport->tcp_flags &= ~TCP_RCV_COPY_DATA;
+	if (transport->stream_copied == req->rq_private_buf.buflen)
+		transport->stream_flags &= ~STREAM_RCV_COPY_DATA;
+	else if (transport->stream_offset == transport->stream_reclen) {
+		if (transport->stream_flags & STREAM_RCV_LAST_FRAG)
+			transport->stream_flags &= ~STREAM_RCV_COPY_DATA;
 	}
 }
 
 /*
  * Finds the request corresponding to the RPC xid and invokes the common
- * tcp read code to read the data.
+ * read code to read the data.
  */
-static inline int xs_tcp_read_reply(struct rpc_xprt *xprt,
+static inline int xs_stream_read_reply(struct rpc_xprt *xprt,
 				    struct xdr_skb_reader *desc)
 {
 	struct sock_xprt *transport =
 				container_of(xprt, struct sock_xprt, xprt);
 	struct rpc_rqst *req;
 
-	dprintk("RPC:       read reply XID %08x\n", ntohl(transport->tcp_xid));
+	dprintk("RPC:       read reply XID %08x\n", ntohl(transport->stream_xid));
 
 	/* Find and lock the request corresponding to this xid */
 	spin_lock_bh(&xprt->transport_lock);
-	req = xprt_lookup_rqst(xprt, transport->tcp_xid);
+	req = xprt_lookup_rqst(xprt, transport->stream_xid);
 	if (!req) {
 		dprintk("RPC:       XID %08x request not found!\n",
-				ntohl(transport->tcp_xid));
+				ntohl(transport->stream_xid));
 		spin_unlock_bh(&xprt->transport_lock);
 		return -1;
 	}
 
-	xs_tcp_read_common(xprt, desc, req);
+	xs_stream_read_common(xprt, desc, req);
 
-	if (!(transport->tcp_flags & TCP_RCV_COPY_DATA))
-		xprt_complete_rqst(req->rq_task, transport->tcp_copied);
+	if (!(transport->stream_flags & STREAM_RCV_COPY_DATA))
+		xprt_complete_rqst(req->rq_task, transport->stream_copied);
 
 	spin_unlock_bh(&xprt->transport_lock);
 	return 0;
@@ -1369,7 +1369,7 @@ static inline int xs_tcp_read_reply(struct rpc_xprt *xprt,
  * If we're unable to obtain the rpc_rqst we schedule the closing of the
  * connection and return -1.
  */
-static int xs_tcp_read_callback(struct rpc_xprt *xprt,
+static int xs_stream_read_callback(struct rpc_xprt *xprt,
 				       struct xdr_skb_reader *desc)
 {
 	struct sock_xprt *transport =
@@ -1378,7 +1378,7 @@ static int xs_tcp_read_callback(struct rpc_xprt *xprt,
 
 	/* Look up and lock the request corresponding to the given XID */
 	spin_lock_bh(&xprt->transport_lock);
-	req = xprt_lookup_bc_request(xprt, transport->tcp_xid);
+	req = xprt_lookup_bc_request(xprt, transport->stream_xid);
 	if (req == NULL) {
 		spin_unlock_bh(&xprt->transport_lock);
 		printk(KERN_WARNING "Callback slot table overflowed\n");
@@ -1387,24 +1387,24 @@ static int xs_tcp_read_callback(struct rpc_xprt *xprt,
 	}
 
 	dprintk("RPC:       read callback  XID %08x\n", ntohl(req->rq_xid));
-	xs_tcp_read_common(xprt, desc, req);
+	xs_stream_read_common(xprt, desc, req);
 
-	if (!(transport->tcp_flags & TCP_RCV_COPY_DATA))
-		xprt_complete_bc_request(req, transport->tcp_copied);
+	if (!(transport->stream_flags & STREAM_RCV_COPY_DATA))
+		xprt_complete_bc_request(req, transport->stream_copied);
 	spin_unlock_bh(&xprt->transport_lock);
 
 	return 0;
 }
 
-static inline int _xs_tcp_read_data(struct rpc_xprt *xprt,
-					struct xdr_skb_reader *desc)
+static inline int _xs_stream_read_data(struct rpc_xprt *xprt,
+				       struct xdr_skb_reader *desc)
 {
 	struct sock_xprt *transport =
 				container_of(xprt, struct sock_xprt, xprt);
 
-	return (transport->tcp_flags & TCP_RPC_REPLY) ?
-		xs_tcp_read_reply(xprt, desc) :
-		xs_tcp_read_callback(xprt, desc);
+	return (transport->stream_flags & STREAM_RPC_REPLY) ?
+		xs_stream_read_reply(xprt, desc) :
+		xs_stream_read_callback(xprt, desc);
 }
 
 static int xs_tcp_bc_up(struct svc_serv *serv, struct net *net)
@@ -1423,10 +1423,10 @@ static size_t xs_tcp_bc_maxpayload(struct rpc_xprt *xprt)
 	return PAGE_SIZE;
 }
 #else
-static inline int _xs_tcp_read_data(struct rpc_xprt *xprt,
-					struct xdr_skb_reader *desc)
+static inline int _xs_stream_read_data(struct rpc_xprt *xprt,
+				       struct xdr_skb_reader *desc)
 {
-	return xs_tcp_read_reply(xprt, desc);
+	return xs_stream_read_reply(xprt, desc);
 }
 #endif /* CONFIG_SUNRPC_BACKCHANNEL */
 
@@ -1434,38 +1434,38 @@ static inline int _xs_tcp_read_data(struct rpc_xprt *xprt,
  * Read data off the transport.  This can be either an RPC_CALL or an
  * RPC_REPLY.  Relay the processing to helper functions.
  */
-static void xs_tcp_read_data(struct rpc_xprt *xprt,
-				    struct xdr_skb_reader *desc)
+static void xs_stream_read_data(struct rpc_xprt *xprt,
+				struct xdr_skb_reader *desc)
 {
 	struct sock_xprt *transport =
 				container_of(xprt, struct sock_xprt, xprt);
 
-	if (_xs_tcp_read_data(xprt, desc) == 0)
-		xs_tcp_check_fraghdr(transport);
+	if (_xs_stream_read_data(xprt, desc) == 0)
+		xs_stream_check_fraghdr(transport);
 	else {
 		/*
 		 * The transport_lock protects the request handling.
-		 * There's no need to hold it to update the tcp_flags.
+		 * There's no need to hold it to update the stream_flags.
 		 */
-		transport->tcp_flags &= ~TCP_RCV_COPY_DATA;
+		transport->stream_flags &= ~STREAM_RCV_COPY_DATA;
 	}
 }
 
-static inline void xs_tcp_read_discard(struct sock_xprt *transport, struct xdr_skb_reader *desc)
+static inline void xs_stream_read_discard(struct sock_xprt *transport, struct xdr_skb_reader *desc)
 {
 	size_t len;
 
-	len = transport->tcp_reclen - transport->tcp_offset;
+	len = transport->stream_reclen - transport->stream_offset;
 	if (len > desc->count)
 		len = desc->count;
 	desc->count -= len;
 	desc->offset += len;
-	transport->tcp_offset += len;
+	transport->stream_offset += len;
 	dprintk("RPC:       discarded %zu bytes\n", len);
-	xs_tcp_check_fraghdr(transport);
+	xs_stream_check_fraghdr(transport);
 }
 
-static int xs_tcp_data_recv(read_descriptor_t *rd_desc, struct sk_buff *skb, unsigned int offset, size_t len)
+static int xs_stream_data_recv(read_descriptor_t *rd_desc, struct sk_buff *skb, unsigned int offset, size_t len)
 {
 	struct rpc_xprt *xprt = rd_desc->arg.data;
 	struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
@@ -1475,35 +1475,35 @@ static int xs_tcp_data_recv(read_descriptor_t *rd_desc, struct sk_buff *skb, uns
 		.count	= len,
 	};
 
-	dprintk("RPC:       xs_tcp_data_recv started\n");
+	dprintk("RPC:       %s started\n", __func__);
 	do {
-		trace_xs_tcp_data_recv(transport);
+		trace_xs_stream_data_recv(transport);
 		/* Read in a new fragment marker if necessary */
 		/* Can we ever really expect to get completely empty fragments? */
-		if (transport->tcp_flags & TCP_RCV_COPY_FRAGHDR) {
-			xs_tcp_read_fraghdr(xprt, &desc);
+		if (transport->stream_flags & STREAM_RCV_COPY_FRAGHDR) {
+			xs_stream_read_fraghdr(xprt, &desc);
 			continue;
 		}
 		/* Read in the xid if necessary */
-		if (transport->tcp_flags & TCP_RCV_COPY_XID) {
-			xs_tcp_read_xid(transport, &desc);
+		if (transport->stream_flags & STREAM_RCV_COPY_XID) {
+			xs_stream_read_xid(transport, &desc);
 			continue;
 		}
 		/* Read in the call/reply flag */
-		if (transport->tcp_flags & TCP_RCV_READ_CALLDIR) {
-			xs_tcp_read_calldir(transport, &desc);
+		if (transport->stream_flags & STREAM_RCV_READ_CALLDIR) {
+			xs_stream_read_calldir(transport, &desc);
 			continue;
 		}
 		/* Read in the request data */
-		if (transport->tcp_flags & TCP_RCV_COPY_DATA) {
-			xs_tcp_read_data(xprt, &desc);
+		if (transport->stream_flags & STREAM_RCV_COPY_DATA) {
+			xs_stream_read_data(xprt, &desc);
 			continue;
 		}
 		/* Skip over any trailing bytes on short reads */
-		xs_tcp_read_discard(transport, &desc);
+		xs_stream_read_discard(transport, &desc);
 	} while (desc.count);
-	trace_xs_tcp_data_recv(transport);
-	dprintk("RPC:       xs_tcp_data_recv done\n");
+	trace_xs_stream_data_recv(transport);
+	dprintk("RPC:       %s done\n", __func__);
 	return len - desc.count;
 }
 
@@ -1526,7 +1526,7 @@ static void xs_tcp_data_receive(struct sock_xprt *transport)
 	/* We use rd_desc to pass struct xprt to xs_tcp_data_recv */
 	for (;;) {
 		lock_sock(sk);
-		read = tcp_read_sock(sk, &rd_desc, xs_tcp_data_recv);
+		read = tcp_read_sock(sk, &rd_desc, xs_stream_data_recv);
 		if (read <= 0) {
 			clear_bit(XPRT_SOCK_DATA_READY, &transport->sock_state);
 			release_sock(sk);
@@ -1577,12 +1577,12 @@ static void xs_tcp_state_change(struct sock *sk)
 		spin_lock(&xprt->transport_lock);
 		if (!xprt_test_and_set_connected(xprt)) {
 
-			/* Reset TCP record info */
-			transport->tcp_offset = 0;
-			transport->tcp_reclen = 0;
-			transport->tcp_copied = 0;
-			transport->tcp_flags =
-				TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID;
+			/* Reset stream record info */
+			transport->stream_offset = 0;
+			transport->stream_reclen = 0;
+			transport->stream_copied = 0;
+			transport->stream_flags =
+				STREAM_RCV_COPY_FRAGHDR | STREAM_RCV_COPY_XID;
 			xprt->connect_cookie++;
 			clear_bit(XPRT_SOCK_CONNECTING, &transport->sock_state);
 			xprt_clear_connecting(xprt);
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 03/14] SUNRPC: abstract tcp_read_sock() in record fragment parser
  2017-06-30 13:23 [PATCH v3 00/14] NFS: add AF_VSOCK support Stefan Hajnoczi
  2017-06-30 13:23 ` [PATCH v3 01/14] SUNRPC: add AF_VSOCK support to addr.[ch] Stefan Hajnoczi
  2017-06-30 13:23 ` [PATCH v3 02/14] SUNRPC: rename "TCP" record parser to "stream" parser Stefan Hajnoczi
@ 2017-06-30 13:23 ` Stefan Hajnoczi
  2017-06-30 13:23 ` [PATCH v3 04/14] SUNRPC: extract xs_stream_reset_state() Stefan Hajnoczi
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 35+ messages in thread
From: Stefan Hajnoczi @ 2017-06-30 13:23 UTC (permalink / raw)
  To: linux-nfs
  Cc: Abbas Naderi, Anna Schumaker, Trond Myklebust, J. Bruce Fields,
	Jeff Layton, Chuck Lever, Stefan Hajnoczi

Use a function pointer to abstract tcp_read_sock()-like functions.  For
TCP this function will be tcp_read_sock().  For AF_VSOCK it will be
vsock_read_sock().

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/linux/sunrpc/xprtsock.h |  5 +++++
 net/sunrpc/xprtsock.c           | 14 ++++++++------
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/include/linux/sunrpc/xprtsock.h b/include/linux/sunrpc/xprtsock.h
index 6749774..a778973 100644
--- a/include/linux/sunrpc/xprtsock.h
+++ b/include/linux/sunrpc/xprtsock.h
@@ -9,6 +9,8 @@
 
 #ifdef __KERNEL__
 
+#include <net/tcp.h> /* for sk_read_actor_t */
+
 int		init_socket_xprt(void);
 void		cleanup_socket_xprt(void);
 
@@ -39,6 +41,9 @@ struct sock_xprt {
 	unsigned long		stream_copied,
 				stream_flags;
 
+	int			(*stream_read_sock)(struct sock *,
+						    read_descriptor_t *,
+						    sk_read_actor_t);
 
 	/*
 	 * Connection of transports
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 5c21aec..184e263 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1507,7 +1507,7 @@ static int xs_stream_data_recv(read_descriptor_t *rd_desc, struct sk_buff *skb,
 	return len - desc.count;
 }
 
-static void xs_tcp_data_receive(struct sock_xprt *transport)
+static void xs_stream_data_receive(struct sock_xprt *transport)
 {
 	struct rpc_xprt *xprt = &transport->xprt;
 	struct sock *sk;
@@ -1523,10 +1523,11 @@ static void xs_tcp_data_receive(struct sock_xprt *transport)
 	if (sk == NULL)
 		goto out;
 
-	/* We use rd_desc to pass struct xprt to xs_tcp_data_recv */
+	/* We use rd_desc to pass struct xprt to xs_stream_data_recv */
 	for (;;) {
 		lock_sock(sk);
-		read = tcp_read_sock(sk, &rd_desc, xs_stream_data_recv);
+		read = transport->stream_read_sock(sk, &rd_desc,
+						   xs_stream_data_recv);
 		if (read <= 0) {
 			clear_bit(XPRT_SOCK_DATA_READY, &transport->sock_state);
 			release_sock(sk);
@@ -1543,11 +1544,11 @@ static void xs_tcp_data_receive(struct sock_xprt *transport)
 	trace_xs_tcp_data_ready(xprt, read, total);
 }
 
-static void xs_tcp_data_receive_workfn(struct work_struct *work)
+static void xs_stream_data_receive_workfn(struct work_struct *work)
 {
 	struct sock_xprt *transport =
 		container_of(work, struct sock_xprt, recv_worker);
-	xs_tcp_data_receive(transport);
+	xs_stream_data_receive(transport);
 }
 
 /**
@@ -1583,6 +1584,7 @@ static void xs_tcp_state_change(struct sock *sk)
 			transport->stream_copied = 0;
 			transport->stream_flags =
 				STREAM_RCV_COPY_FRAGHDR | STREAM_RCV_COPY_XID;
+			transport->stream_read_sock = tcp_read_sock;
 			xprt->connect_cookie++;
 			clear_bit(XPRT_SOCK_CONNECTING, &transport->sock_state);
 			xprt_clear_connecting(xprt);
@@ -3065,7 +3067,7 @@ static struct rpc_xprt *xs_setup_tcp(struct xprt_create *args)
 	xprt->connect_timeout = xprt->timeout->to_initval *
 		(xprt->timeout->to_retries + 1);
 
-	INIT_WORK(&transport->recv_worker, xs_tcp_data_receive_workfn);
+	INIT_WORK(&transport->recv_worker, xs_stream_data_receive_workfn);
 	INIT_DELAYED_WORK(&transport->connect_worker, xs_tcp_setup_socket);
 
 	switch (addr->sa_family) {
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 04/14] SUNRPC: extract xs_stream_reset_state()
  2017-06-30 13:23 [PATCH v3 00/14] NFS: add AF_VSOCK support Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2017-06-30 13:23 ` [PATCH v3 03/14] SUNRPC: abstract tcp_read_sock() in record fragment parser Stefan Hajnoczi
@ 2017-06-30 13:23 ` Stefan Hajnoczi
  2017-06-30 13:23 ` [PATCH v3 05/14] VSOCK: add tcp_read_sock()-like vsock_read_sock() function Stefan Hajnoczi
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 35+ messages in thread
From: Stefan Hajnoczi @ 2017-06-30 13:23 UTC (permalink / raw)
  To: linux-nfs
  Cc: Abbas Naderi, Anna Schumaker, Trond Myklebust, J. Bruce Fields,
	Jeff Layton, Chuck Lever, Stefan Hajnoczi

Extract a function to reset the record fragment parser.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 net/sunrpc/xprtsock.c | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 184e263..fd0c8b1 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1552,6 +1552,28 @@ static void xs_stream_data_receive_workfn(struct work_struct *work)
 }
 
 /**
+ * xs_stream_reset_state - reset SOCK_STREAM record parser
+ * @transport: socket transport
+ * @read_sock: tcp_read_sock()-like function
+ *
+ */
+static void xs_stream_reset_state(struct rpc_xprt *xprt,
+				  int (*read_sock)(struct sock *,
+						   read_descriptor_t *,
+						   sk_read_actor_t))
+{
+	struct sock_xprt *transport = container_of(xprt,
+			struct sock_xprt, xprt);
+
+	transport->stream_offset = 0;
+	transport->stream_reclen = 0;
+	transport->stream_copied = 0;
+	transport->stream_flags =
+		STREAM_RCV_COPY_FRAGHDR | STREAM_RCV_COPY_XID;
+	transport->stream_read_sock = read_sock;
+}
+
+/**
  * xs_tcp_state_change - callback to handle TCP socket state changes
  * @sk: socket whose state has changed
  *
@@ -1577,14 +1599,7 @@ static void xs_tcp_state_change(struct sock *sk)
 	case TCP_ESTABLISHED:
 		spin_lock(&xprt->transport_lock);
 		if (!xprt_test_and_set_connected(xprt)) {
-
-			/* Reset stream record info */
-			transport->stream_offset = 0;
-			transport->stream_reclen = 0;
-			transport->stream_copied = 0;
-			transport->stream_flags =
-				STREAM_RCV_COPY_FRAGHDR | STREAM_RCV_COPY_XID;
-			transport->stream_read_sock = tcp_read_sock;
+			xs_stream_reset_state(xprt, tcp_read_sock);
 			xprt->connect_cookie++;
 			clear_bit(XPRT_SOCK_CONNECTING, &transport->sock_state);
 			xprt_clear_connecting(xprt);
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 05/14] VSOCK: add tcp_read_sock()-like vsock_read_sock() function
  2017-06-30 13:23 [PATCH v3 00/14] NFS: add AF_VSOCK support Stefan Hajnoczi
                   ` (3 preceding siblings ...)
  2017-06-30 13:23 ` [PATCH v3 04/14] SUNRPC: extract xs_stream_reset_state() Stefan Hajnoczi
@ 2017-06-30 13:23 ` Stefan Hajnoczi
  2017-10-31 13:35   ` Jeff Layton
  2017-06-30 13:23 ` [PATCH v3 06/14] SUNRPC: add AF_VSOCK support to xprtsock.c Stefan Hajnoczi
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 35+ messages in thread
From: Stefan Hajnoczi @ 2017-06-30 13:23 UTC (permalink / raw)
  To: linux-nfs
  Cc: Abbas Naderi, Anna Schumaker, Trond Myklebust, J. Bruce Fields,
	Jeff Layton, Chuck Lever, Stefan Hajnoczi

The tcp_read_sock() interface dequeues skbs and gives them to the
caller's callback function for processing.  This interface can avoid
data copies since the caller accesses the skb instead of using its own
receive buffer.

This patch implements vsock_read_sock() for AF_VSOCK SOCK_STREAM
sockets.  The implementation is only for virtio-vsock at this time, not
for the VMware VMCI transport.  It is not zero-copy yet because the
virtio-vsock receive queue does not consist of skbs.

The tcp_read_sock()-like interface is needed for AF_VSOCK sunrpc
support.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/linux/virtio_vsock.h            |  4 ++
 include/net/af_vsock.h                  |  5 +++
 drivers/vhost/vsock.c                   |  1 +
 net/vmw_vsock/af_vsock.c                | 16 ++++++++
 net/vmw_vsock/virtio_transport.c        |  1 +
 net/vmw_vsock/virtio_transport_common.c | 66 +++++++++++++++++++++++++++++++++
 net/vmw_vsock/vmci_transport.c          |  8 ++++
 7 files changed, 101 insertions(+)

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index ab13f07..d6cf428 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -5,6 +5,7 @@
 #include <linux/socket.h>
 #include <net/sock.h>
 #include <net/af_vsock.h>
+#include <net/tcp.h> /* for sk_read_actor_t */
 
 #define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE	128
 #define VIRTIO_VSOCK_DEFAULT_BUF_SIZE		(1024 * 256)
@@ -126,6 +127,9 @@ int virtio_transport_notify_send_post_enqueue(struct vsock_sock *vsk,
 u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
 bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
 bool virtio_transport_stream_allow(u32 cid, u32 port);
+int virtio_transport_stream_read_sock(struct vsock_sock *vsk,
+				      read_descriptor_t *desc,
+				      sk_read_actor_t recv_actor);
 int virtio_transport_dgram_bind(struct vsock_sock *vsk,
 				struct sockaddr_vm *addr);
 bool virtio_transport_dgram_allow(u32 cid, u32 port);
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index f9fb566..85fa345 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -19,6 +19,7 @@
 #include <linux/kernel.h>
 #include <linux/workqueue.h>
 #include <linux/vm_sockets.h>
+#include <net/tcp.h> /* for sk_read_actor_t */
 
 #include "vsock_addr.h"
 
@@ -73,6 +74,8 @@ struct vsock_sock {
 	void *trans;
 };
 
+int vsock_read_sock(struct sock *sk, read_descriptor_t *desc,
+		    sk_read_actor_t recv_actor);
 s64 vsock_stream_has_data(struct vsock_sock *vsk);
 s64 vsock_stream_has_space(struct vsock_sock *vsk);
 void vsock_pending_work(struct work_struct *work);
@@ -125,6 +128,8 @@ struct vsock_transport {
 	u64 (*stream_rcvhiwat)(struct vsock_sock *);
 	bool (*stream_is_active)(struct vsock_sock *);
 	bool (*stream_allow)(u32 cid, u32 port);
+	int (*stream_read_sock)(struct vsock_sock *, read_descriptor_t *desc,
+				sk_read_actor_t recv_actor);
 
 	/* Notification. */
 	int (*notify_poll_in)(struct vsock_sock *, size_t, bool *);
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 3acef3c..5128e41 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -734,6 +734,7 @@ static struct virtio_transport vhost_transport = {
 		.stream_rcvhiwat          = virtio_transport_stream_rcvhiwat,
 		.stream_is_active         = virtio_transport_stream_is_active,
 		.stream_allow             = virtio_transport_stream_allow,
+		.stream_read_sock         = virtio_transport_stream_read_sock,
 
 		.notify_poll_in           = virtio_transport_notify_poll_in,
 		.notify_poll_out          = virtio_transport_notify_poll_out,
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index dfc8c51e..2ae94b5a 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -706,6 +706,22 @@ static void vsock_sk_destruct(struct sock *sk)
 	put_cred(vsk->owner);
 }
 
+/* See documentation for tcp_read_sock() */
+int vsock_read_sock(struct sock *sk, read_descriptor_t *desc,
+		    sk_read_actor_t recv_actor)
+{
+	struct vsock_sock *vsp = vsock_sk(sk);
+
+	if (sk->sk_type != SOCK_STREAM)
+		return -EOPNOTSUPP;
+
+	if (sk->sk_state != SS_CONNECTED && sk->sk_state != SS_DISCONNECTING)
+		return -ENOTCONN;
+
+	return transport->stream_read_sock(vsp, desc, recv_actor);
+}
+EXPORT_SYMBOL(vsock_read_sock);
+
 static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 {
 	int err;
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 403d86e80..8264307 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -520,6 +520,7 @@ static struct virtio_transport virtio_transport = {
 		.stream_rcvhiwat          = virtio_transport_stream_rcvhiwat,
 		.stream_is_active         = virtio_transport_stream_is_active,
 		.stream_allow             = virtio_transport_stream_allow,
+		.stream_read_sock         = virtio_transport_stream_read_sock,
 
 		.notify_poll_in           = virtio_transport_notify_poll_in,
 		.notify_poll_out          = virtio_transport_notify_poll_out,
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 18e2479..510d9e1 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -316,6 +316,72 @@ virtio_transport_stream_dequeue(struct vsock_sock *vsk,
 EXPORT_SYMBOL_GPL(virtio_transport_stream_dequeue);
 
 int
+virtio_transport_stream_read_sock(struct vsock_sock *vsk,
+				  read_descriptor_t *desc,
+				  sk_read_actor_t recv_actor)
+{
+	struct virtio_vsock_sock *vvs;
+	int ret = 0;
+
+	vvs = vsk->trans;
+
+	spin_lock_bh(&vvs->rx_lock);
+	while (!list_empty(&vvs->rx_queue)) {
+		struct virtio_vsock_pkt *pkt;
+		struct sk_buff *skb;
+		size_t len;
+		int used;
+
+		pkt = list_first_entry(&vvs->rx_queue,
+				       struct virtio_vsock_pkt, list);
+
+		/* sk_lock is held by caller so no one else can dequeue.
+		 * Unlock rx_lock so recv_actor() can sleep.
+		 */
+		spin_unlock_bh(&vvs->rx_lock);
+
+		len = pkt->len - pkt->off;
+		skb = alloc_skb(len, GFP_KERNEL);
+		if (!skb) {
+			ret = -ENOMEM;
+			goto out_nolock;
+		}
+
+		memcpy(skb_put(skb, len),
+		       pkt->buf + pkt->off,
+		       len);
+
+		used = recv_actor(desc, skb, 0, len);
+
+		kfree_skb(skb);
+
+		spin_lock_bh(&vvs->rx_lock);
+
+		if (used > 0) {
+			ret += used;
+			pkt->off += used;
+			if (pkt->off == pkt->len) {
+				virtio_transport_dec_rx_pkt(vvs, pkt);
+				list_del(&pkt->list);
+				virtio_transport_free_pkt(pkt);
+			}
+		}
+
+		if (used <= 0 || !desc->count)
+			break;
+	}
+	spin_unlock_bh(&vvs->rx_lock);
+
+out_nolock:
+	if (ret > 0)
+		virtio_transport_send_credit_update(vsk,
+				VIRTIO_VSOCK_TYPE_STREAM, NULL);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_stream_read_sock);
+
+int
 virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
 			       struct msghdr *msg,
 			       size_t len, int flags)
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index 10ae782..dfbdb4c 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -646,6 +646,13 @@ static bool vmci_transport_stream_allow(u32 cid, u32 port)
 	return true;
 }
 
+static int vmci_transport_stream_read_sock(struct vsock_sock *vsk,
+					   read_descriptor_t *desc,
+					   sk_read_actor_t recv_actor)
+{
+	return -EOPNOTSUPP; /* not yet implemented */
+}
+
 /* This is invoked as part of a tasklet that's scheduled when the VMCI
  * interrupt fires.  This is run in bottom-half context but it defers most of
  * its work to the packet handling work queue.
@@ -2061,6 +2068,7 @@ static const struct vsock_transport vmci_transport = {
 	.stream_rcvhiwat = vmci_transport_stream_rcvhiwat,
 	.stream_is_active = vmci_transport_stream_is_active,
 	.stream_allow = vmci_transport_stream_allow,
+	.stream_read_sock = vmci_transport_stream_read_sock,
 	.notify_poll_in = vmci_transport_notify_poll_in,
 	.notify_poll_out = vmci_transport_notify_poll_out,
 	.notify_recv_init = vmci_transport_notify_recv_init,
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 06/14] SUNRPC: add AF_VSOCK support to xprtsock.c
  2017-06-30 13:23 [PATCH v3 00/14] NFS: add AF_VSOCK support Stefan Hajnoczi
                   ` (4 preceding siblings ...)
  2017-06-30 13:23 ` [PATCH v3 05/14] VSOCK: add tcp_read_sock()-like vsock_read_sock() function Stefan Hajnoczi
@ 2017-06-30 13:23 ` Stefan Hajnoczi
  2017-11-07 13:46   ` Jeff Layton
  2017-06-30 13:23 ` [PATCH v3 07/14] SUNRPC: drop unnecessary svc_bc_tcp_create() helper Stefan Hajnoczi
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 35+ messages in thread
From: Stefan Hajnoczi @ 2017-06-30 13:23 UTC (permalink / raw)
  To: linux-nfs
  Cc: Abbas Naderi, Anna Schumaker, Trond Myklebust, J. Bruce Fields,
	Jeff Layton, Chuck Lever, Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/linux/sunrpc/xprt.h |   1 +
 net/sunrpc/xprtsock.c       | 385 +++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 381 insertions(+), 5 deletions(-)

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index eab1c74..c038d8a 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -170,6 +170,7 @@ enum xprt_transports {
 	XPRT_TRANSPORT_RDMA	= 256,
 	XPRT_TRANSPORT_BC_RDMA	= XPRT_TRANSPORT_RDMA | XPRT_TRANSPORT_BC,
 	XPRT_TRANSPORT_LOCAL	= 257,
+	XPRT_TRANSPORT_VSOCK	= 258,
 };
 
 struct rpc_xprt {
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index fd0c8b1..cc343b91 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -46,6 +46,7 @@
 #include <net/checksum.h>
 #include <net/udp.h>
 #include <net/tcp.h>
+#include <net/af_vsock.h>
 
 #include <trace/events/sunrpc.h>
 
@@ -271,6 +272,13 @@ static void xs_format_common_peer_addresses(struct rpc_xprt *xprt)
 		sin6 = xs_addr_in6(xprt);
 		snprintf(buf, sizeof(buf), "%pi6", &sin6->sin6_addr);
 		break;
+	case AF_VSOCK:
+		(void)rpc_ntop(sap, buf, sizeof(buf));
+		xprt->address_strings[RPC_DISPLAY_ADDR] =
+						kstrdup(buf, GFP_KERNEL);
+		snprintf(buf, sizeof(buf), "%08x",
+			 ((struct sockaddr_vm *)sap)->svm_cid);
+		break;
 	default:
 		BUG();
 	}
@@ -1881,21 +1889,30 @@ static int xs_bind(struct sock_xprt *transport, struct socket *sock)
 			nloop++;
 	} while (err == -EADDRINUSE && nloop != 2);
 
-	if (myaddr.ss_family == AF_INET)
+	switch (myaddr.ss_family) {
+	case AF_INET:
 		dprintk("RPC:       %s %pI4:%u: %s (%d)\n", __func__,
 				&((struct sockaddr_in *)&myaddr)->sin_addr,
 				port, err ? "failed" : "ok", err);
-	else
+		break;
+	case AF_INET6:
 		dprintk("RPC:       %s %pI6:%u: %s (%d)\n", __func__,
 				&((struct sockaddr_in6 *)&myaddr)->sin6_addr,
 				port, err ? "failed" : "ok", err);
+		break;
+	case AF_VSOCK:
+		dprintk("RPC:       %s %u:%u: %s (%d)\n", __func__,
+				((struct sockaddr_vm *)&myaddr)->svm_cid,
+				port, err ? "failed" : "ok", err);
+		break;
+	}
 	return err;
 }
 
 /*
- * We don't support autobind on AF_LOCAL sockets
+ * We don't support autobind on AF_LOCAL and AF_VSOCK sockets
  */
-static void xs_local_rpcbind(struct rpc_task *task)
+static void xs_dummy_rpcbind(struct rpc_task *task)
 {
 	xprt_set_bound(task->tk_xprt);
 }
@@ -1932,6 +1949,14 @@ static inline void xs_reclassify_socket6(struct socket *sock)
 		&xs_slock_key[1], "sk_lock-AF_INET6-RPC", &xs_key[1]);
 }
 
+static inline void xs_reclassify_socket_vsock(struct socket *sock)
+{
+	struct sock *sk = sock->sk;
+
+	sock_lock_init_class_and_name(sk, "slock-AF_VSOCK-RPC",
+		&xs_slock_key[1], "sk_lock-AF_VSOCK-RPC", &xs_key[1]);
+}
+
 static inline void xs_reclassify_socket(int family, struct socket *sock)
 {
 	if (WARN_ON_ONCE(!sock_allow_reclassification(sock->sk)))
@@ -1947,6 +1972,9 @@ static inline void xs_reclassify_socket(int family, struct socket *sock)
 	case AF_INET6:
 		xs_reclassify_socket6(sock);
 		break;
+	case AF_VSOCK:
+		xs_reclassify_socket_vsock(sock);
+		break;
 	}
 }
 #else
@@ -2743,7 +2771,7 @@ static struct rpc_xprt_ops xs_local_ops = {
 	.reserve_xprt		= xprt_reserve_xprt,
 	.release_xprt		= xs_tcp_release_xprt,
 	.alloc_slot		= xprt_alloc_slot,
-	.rpcbind		= xs_local_rpcbind,
+	.rpcbind		= xs_dummy_rpcbind,
 	.set_port		= xs_local_set_port,
 	.connect		= xs_local_connect,
 	.buf_alloc		= rpc_malloc,
@@ -2836,6 +2864,10 @@ static int xs_init_anyaddr(const int family, struct sockaddr *sap)
 		.sin6_family		= AF_INET6,
 		.sin6_addr		= IN6ADDR_ANY_INIT,
 	};
+	static const struct sockaddr_vm svm = {
+		.svm_family		= AF_VSOCK,
+		.svm_cid		= VMADDR_CID_ANY,
+	};
 
 	switch (family) {
 	case AF_LOCAL:
@@ -2846,6 +2878,9 @@ static int xs_init_anyaddr(const int family, struct sockaddr *sap)
 	case AF_INET6:
 		memcpy(sap, &sin6, sizeof(sin6));
 		break;
+	case AF_VSOCK:
+		memcpy(sap, &svm, sizeof(svm));
+		break;
 	default:
 		dprintk("RPC:       %s: Bad address family\n", __func__);
 		return -EAFNOSUPPORT;
@@ -3203,6 +3238,330 @@ static struct rpc_xprt *xs_setup_bc_tcp(struct xprt_create *args)
 	return ret;
 }
 
+#ifdef CONFIG_SUNRPC_XPRT_VSOCK
+/**
+ * xs_vsock_state_change - callback to handle vsock socket state changes
+ * @sk: socket whose state has changed
+ *
+ */
+static void xs_vsock_state_change(struct sock *sk)
+{
+	struct rpc_xprt *xprt;
+
+	read_lock_bh(&sk->sk_callback_lock);
+	if (!(xprt = xprt_from_sock(sk)))
+		goto out;
+	dprintk("RPC:       %s client %p...\n", __func__, xprt);
+	dprintk("RPC:       state %x conn %d dead %d zapped %d sk_shutdown %d\n",
+			sk->sk_state, xprt_connected(xprt),
+			sock_flag(sk, SOCK_DEAD),
+			sock_flag(sk, SOCK_ZAPPED),
+			sk->sk_shutdown);
+
+	trace_rpc_socket_state_change(xprt, sk->sk_socket);
+
+	switch (sk->sk_state) {
+	case SS_CONNECTING:
+		/* Do nothing */
+		break;
+
+	case SS_CONNECTED:
+		spin_lock(&xprt->transport_lock);
+		if (!xprt_test_and_set_connected(xprt)) {
+			xs_stream_reset_state(xprt, vsock_read_sock);
+			xprt->connect_cookie++;
+
+			xprt_wake_pending_tasks(xprt, -EAGAIN);
+		}
+		spin_unlock(&xprt->transport_lock);
+		break;
+
+	case SS_DISCONNECTING:
+		/* TODO do we need to distinguish between various shutdown (client-side/server-side)? */
+		/* The client initiated a shutdown of the socket */
+		xprt->connect_cookie++;
+		xprt->reestablish_timeout = 0;
+		set_bit(XPRT_CLOSING, &xprt->state);
+		smp_mb__before_atomic();
+		clear_bit(XPRT_CONNECTED, &xprt->state);
+		clear_bit(XPRT_CLOSE_WAIT, &xprt->state);
+		smp_mb__after_atomic();
+		break;
+
+	case SS_UNCONNECTED:
+		xs_sock_mark_closed(xprt);
+		break;
+	}
+
+ out:
+	read_unlock_bh(&sk->sk_callback_lock);
+}
+
+/**
+ * xs_vsock_error_report - callback to handle vsock socket state errors
+ * @sk: socket
+ *
+ * Note: we don't call sock_error() since there may be a rpc_task
+ * using the socket, and so we don't want to clear sk->sk_err.
+ */
+static void xs_vsock_error_report(struct sock *sk)
+{
+	struct rpc_xprt *xprt;
+	int err;
+
+	read_lock_bh(&sk->sk_callback_lock);
+	if (!(xprt = xprt_from_sock(sk)))
+		goto out;
+
+	err = -sk->sk_err;
+	if (err == 0)
+		goto out;
+	/* Is this a reset event? */
+	if (sk->sk_state == SS_UNCONNECTED)
+		xs_sock_mark_closed(xprt);
+	dprintk("RPC:       %s client %p, error=%d...\n",
+			__func__, xprt, -err);
+	trace_rpc_socket_error(xprt, sk->sk_socket, err);
+	xprt_wake_pending_tasks(xprt, err);
+ out:
+	read_unlock_bh(&sk->sk_callback_lock);
+}
+
+/**
+ * xs_vsock_finish_connecting - initialize and connect socket
+ */
+static int xs_vsock_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
+{
+	struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
+	int ret = -ENOTCONN;
+
+	if (!transport->inet) {
+		struct sock *sk = sock->sk;
+
+		write_lock_bh(&sk->sk_callback_lock);
+
+		xs_save_old_callbacks(transport, sk);
+
+		sk->sk_user_data = xprt;
+		sk->sk_data_ready = xs_data_ready;
+		sk->sk_state_change = xs_vsock_state_change;
+		sk->sk_write_space = xs_tcp_write_space;
+		sk->sk_error_report = xs_vsock_error_report;
+		sk->sk_allocation = GFP_ATOMIC;
+
+		xprt_clear_connected(xprt);
+
+		/* Reset to new socket */
+		transport->sock = sock;
+		transport->inet = sk;
+
+		write_unlock_bh(&sk->sk_callback_lock);
+	}
+
+	if (!xprt_bound(xprt))
+		goto out;
+
+	xs_set_memalloc(xprt);
+
+	/* Tell the socket layer to start connecting... */
+	xprt->stat.connect_count++;
+	xprt->stat.connect_start = jiffies;
+	ret = kernel_connect(sock, xs_addr(xprt), xprt->addrlen, O_NONBLOCK);
+	switch (ret) {
+	case 0:
+		xs_set_srcport(transport, sock);
+	case -EINPROGRESS:
+		/* SYN_SENT! */
+		if (xprt->reestablish_timeout < XS_TCP_INIT_REEST_TO)
+			xprt->reestablish_timeout = XS_TCP_INIT_REEST_TO;
+	}
+out:
+	return ret;
+}
+
+/**
+ * xs_vsock_setup_socket - create a vsock socket and connect to a remote endpoint
+ *
+ * Invoked by a work queue tasklet.
+ */
+static void xs_vsock_setup_socket(struct work_struct *work)
+{
+	struct sock_xprt *transport =
+		container_of(work, struct sock_xprt, connect_worker.work);
+	struct socket *sock = transport->sock;
+	struct rpc_xprt *xprt = &transport->xprt;
+	int status = -EIO;
+
+	if (!sock) {
+		sock = xs_create_sock(xprt, transport,
+				xs_addr(xprt)->sa_family, SOCK_STREAM,
+				0, true);
+		if (IS_ERR(sock)) {
+			status = PTR_ERR(sock);
+			goto out;
+		}
+	}
+
+	dprintk("RPC:       worker connecting xprt %p via %s to "
+				"%s (port %s)\n", xprt,
+			xprt->address_strings[RPC_DISPLAY_PROTO],
+			xprt->address_strings[RPC_DISPLAY_ADDR],
+			xprt->address_strings[RPC_DISPLAY_PORT]);
+
+	status = xs_vsock_finish_connecting(xprt, sock);
+	trace_rpc_socket_connect(xprt, sock, status);
+	dprintk("RPC:       %p connect status %d connected %d sock state %d\n",
+			xprt, -status, xprt_connected(xprt),
+			sock->sk->sk_state);
+	switch (status) {
+	default:
+		printk("%s: connect returned unhandled error %d\n",
+			__func__, status);
+	case -EADDRNOTAVAIL:
+		/* We're probably in TIME_WAIT. Get rid of existing socket,
+		 * and retry
+		 */
+		xs_tcp_force_close(xprt);
+		break;
+	case 0:
+	case -EINPROGRESS:
+	case -EALREADY:
+		xprt_unlock_connect(xprt, transport);
+		xprt_clear_connecting(xprt);
+		return;
+	case -EINVAL:
+		/* Happens, for instance, if the user specified a link
+		 * local IPv6 address without a scope-id.
+		 */
+	case -ECONNREFUSED:
+	case -ECONNRESET:
+	case -ENETUNREACH:
+	case -EADDRINUSE:
+	case -ENOBUFS:
+		/* retry with existing socket, after a delay */
+		xs_tcp_force_close(xprt);
+		goto out;
+	}
+	status = -EAGAIN;
+out:
+	xprt_unlock_connect(xprt, transport);
+	xprt_clear_connecting(xprt);
+	xprt_wake_pending_tasks(xprt, status);
+}
+
+/**
+ * xs_vsock_print_stats - display vsock socket-specifc stats
+ * @xprt: rpc_xprt struct containing statistics
+ * @seq: output file
+ *
+ */
+static void xs_vsock_print_stats(struct rpc_xprt *xprt, struct seq_file *seq)
+{
+	struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
+	long idle_time = 0;
+
+	if (xprt_connected(xprt))
+		idle_time = (long)(jiffies - xprt->last_used) / HZ;
+
+	seq_printf(seq, "\txprt:\tvsock %u %lu %lu %lu %ld %lu %lu %lu "
+			"%llu %llu %lu %llu %llu\n",
+			transport->srcport,
+			xprt->stat.bind_count,
+			xprt->stat.connect_count,
+			xprt->stat.connect_time,
+			idle_time,
+			xprt->stat.sends,
+			xprt->stat.recvs,
+			xprt->stat.bad_xids,
+			xprt->stat.req_u,
+			xprt->stat.bklog_u,
+			xprt->stat.max_slots,
+			xprt->stat.sending_u,
+			xprt->stat.pending_u);
+}
+
+static struct rpc_xprt_ops xs_vsock_ops = {
+	.reserve_xprt		= xprt_reserve_xprt,
+	.release_xprt		= xs_tcp_release_xprt,
+	.alloc_slot		= xprt_lock_and_alloc_slot,
+	.rpcbind		= xs_dummy_rpcbind,
+	.set_port		= xs_set_port,
+	.connect		= xs_connect,
+	.buf_alloc		= rpc_malloc,
+	.buf_free		= rpc_free,
+	.send_request		= xs_tcp_send_request,
+	.set_retrans_timeout	= xprt_set_retrans_timeout_def,
+	.close			= xs_tcp_shutdown,
+	.destroy		= xs_destroy,
+	.print_stats		= xs_vsock_print_stats,
+};
+
+static const struct rpc_timeout xs_vsock_default_timeout = {
+	.to_initval = 60 * HZ,
+	.to_maxval = 60 * HZ,
+	.to_retries = 2,
+};
+
+/**
+ * xs_setup_vsock - Set up transport to use a vsock socket
+ * @args: rpc transport creation arguments
+ *
+ */
+static struct rpc_xprt *xs_setup_vsock(struct xprt_create *args)
+{
+	struct sockaddr_vm *addr = (struct sockaddr_vm *)args->dstaddr;
+	struct sock_xprt *transport;
+	struct rpc_xprt *xprt;
+	struct rpc_xprt *ret;
+
+	xprt = xs_setup_xprt(args, xprt_tcp_slot_table_entries,
+			     xprt_max_tcp_slot_table_entries);
+	if (IS_ERR(xprt))
+		return xprt;
+	transport = container_of(xprt, struct sock_xprt, xprt);
+
+	xprt->prot = 0;
+	xprt->tsh_size = sizeof(rpc_fraghdr) / sizeof(u32);
+	xprt->max_payload = RPC_MAX_FRAGMENT_SIZE;
+
+	xprt->bind_timeout = XS_BIND_TO;
+	xprt->reestablish_timeout = XS_TCP_INIT_REEST_TO;
+	xprt->idle_timeout = XS_IDLE_DISC_TO;
+
+	xprt->ops = &xs_vsock_ops;
+	xprt->timeout = &xs_vsock_default_timeout;
+
+	INIT_WORK(&transport->recv_worker, xs_stream_data_receive_workfn);
+	INIT_DELAYED_WORK(&transport->connect_worker, xs_vsock_setup_socket);
+
+	switch (addr->svm_family) {
+	case AF_VSOCK:
+		if (addr->svm_port == 0) {
+			dprintk("RPC:       autobind not supported with AF_VSOCK\n");
+			ret = ERR_PTR(-EINVAL);
+			goto out_err;
+		}
+		xprt_set_bound(xprt);
+		xs_format_peer_addresses(xprt, "vsock", "vsock" /* TODO register official netid? */);
+		break;
+	default:
+		ret = ERR_PTR(-EAFNOSUPPORT);
+		goto out_err;
+	}
+
+	dprintk("RPC:       set up xprt to %s (port %s) via AF_VSOCK\n",
+		xprt->address_strings[RPC_DISPLAY_ADDR],
+		xprt->address_strings[RPC_DISPLAY_PORT]);
+
+	if (try_module_get(THIS_MODULE))
+		return xprt;
+	ret = ERR_PTR(-EINVAL);
+out_err:
+	xs_xprt_free(xprt);
+	return ret;
+}
+#endif
+
 static struct xprt_class	xs_local_transport = {
 	.list		= LIST_HEAD_INIT(xs_local_transport.list),
 	.name		= "named UNIX socket",
@@ -3235,6 +3594,16 @@ static struct xprt_class	xs_bc_tcp_transport = {
 	.setup		= xs_setup_bc_tcp,
 };
 
+#ifdef CONFIG_SUNRPC_XPRT_VSOCK
+static struct xprt_class	xs_vsock_transport = {
+	.list		= LIST_HEAD_INIT(xs_vsock_transport.list),
+	.name		= "vsock",
+	.owner		= THIS_MODULE,
+	.ident		= XPRT_TRANSPORT_VSOCK,
+	.setup		= xs_setup_vsock,
+};
+#endif
+
 /**
  * init_socket_xprt - set up xprtsock's sysctls, register with RPC client
  *
@@ -3250,6 +3619,9 @@ int init_socket_xprt(void)
 	xprt_register_transport(&xs_udp_transport);
 	xprt_register_transport(&xs_tcp_transport);
 	xprt_register_transport(&xs_bc_tcp_transport);
+#ifdef CONFIG_SUNRPC_XPRT_VSOCK
+	xprt_register_transport(&xs_vsock_transport);
+#endif
 
 	return 0;
 }
@@ -3271,6 +3643,9 @@ void cleanup_socket_xprt(void)
 	xprt_unregister_transport(&xs_udp_transport);
 	xprt_unregister_transport(&xs_tcp_transport);
 	xprt_unregister_transport(&xs_bc_tcp_transport);
+#ifdef CONFIG_SUNRPC_XPRT_VSOCK
+	xprt_unregister_transport(&xs_vsock_transport);
+#endif
 }
 
 static int param_set_uint_minmax(const char *val,
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 07/14] SUNRPC: drop unnecessary svc_bc_tcp_create() helper
  2017-06-30 13:23 [PATCH v3 00/14] NFS: add AF_VSOCK support Stefan Hajnoczi
                   ` (5 preceding siblings ...)
  2017-06-30 13:23 ` [PATCH v3 06/14] SUNRPC: add AF_VSOCK support to xprtsock.c Stefan Hajnoczi
@ 2017-06-30 13:23 ` Stefan Hajnoczi
  2017-10-31 13:55   ` Jeff Layton
  2017-06-30 13:23 ` [PATCH v3 08/14] SUNRPC: add AF_VSOCK support to svc_xprt.c Stefan Hajnoczi
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 35+ messages in thread
From: Stefan Hajnoczi @ 2017-06-30 13:23 UTC (permalink / raw)
  To: linux-nfs
  Cc: Abbas Naderi, Anna Schumaker, Trond Myklebust, J. Bruce Fields,
	Jeff Layton, Chuck Lever, Stefan Hajnoczi

svc_bc_tcp_create() is a helper function that simply calls
svc_bc_create_socket() with an added IPPROTO_TCP argument.

svc_bc_create_socket() then checks that the protocol argument is indeed
IPPROTO_TCP.

This isn't necessary since svc_bc_tcp_create() is the only
svc_bc_create_socket() caller.  The next patch adds a second caller for
AF_VSOCK where IPPROTO_TCP will not be used.

Scrap this scheme and just call svc_bc_create_socket() directly.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 net/sunrpc/svcsock.c | 21 +++------------------
 1 file changed, 3 insertions(+), 18 deletions(-)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 2b720fa..394e7a2 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -71,7 +71,7 @@ static struct svc_xprt *svc_create_socket(struct svc_serv *, int,
 					  struct net *, struct sockaddr *,
 					  int, int);
 #if defined(CONFIG_SUNRPC_BACKCHANNEL)
-static struct svc_xprt *svc_bc_create_socket(struct svc_serv *, int,
+static struct svc_xprt *svc_bc_create_socket(struct svc_serv *,
 					     struct net *, struct sockaddr *,
 					     int, int);
 static void svc_bc_sock_free(struct svc_xprt *xprt);
@@ -1212,25 +1212,17 @@ static struct svc_xprt *svc_tcp_create(struct svc_serv *serv,
 }
 
 #if defined(CONFIG_SUNRPC_BACKCHANNEL)
-static struct svc_xprt *svc_bc_create_socket(struct svc_serv *, int,
+static struct svc_xprt *svc_bc_create_socket(struct svc_serv *,
 					     struct net *, struct sockaddr *,
 					     int, int);
 static void svc_bc_sock_free(struct svc_xprt *xprt);
 
-static struct svc_xprt *svc_bc_tcp_create(struct svc_serv *serv,
-				       struct net *net,
-				       struct sockaddr *sa, int salen,
-				       int flags)
-{
-	return svc_bc_create_socket(serv, IPPROTO_TCP, net, sa, salen, flags);
-}
-
 static void svc_bc_tcp_sock_detach(struct svc_xprt *xprt)
 {
 }
 
 static struct svc_xprt_ops svc_tcp_bc_ops = {
-	.xpo_create = svc_bc_tcp_create,
+	.xpo_create = svc_bc_create_socket,
 	.xpo_detach = svc_bc_tcp_sock_detach,
 	.xpo_free = svc_bc_sock_free,
 	.xpo_prep_reply_hdr = svc_tcp_prep_reply_hdr,
@@ -1615,7 +1607,6 @@ static void svc_sock_free(struct svc_xprt *xprt)
  * Create a back channel svc_xprt which shares the fore channel socket.
  */
 static struct svc_xprt *svc_bc_create_socket(struct svc_serv *serv,
-					     int protocol,
 					     struct net *net,
 					     struct sockaddr *sin, int len,
 					     int flags)
@@ -1623,12 +1614,6 @@ static struct svc_xprt *svc_bc_create_socket(struct svc_serv *serv,
 	struct svc_sock *svsk;
 	struct svc_xprt *xprt;
 
-	if (protocol != IPPROTO_TCP) {
-		printk(KERN_WARNING "svc: only TCP sockets"
-			" supported on shared back channel\n");
-		return ERR_PTR(-EINVAL);
-	}
-
 	svsk = kzalloc(sizeof(*svsk), GFP_KERNEL);
 	if (!svsk)
 		return ERR_PTR(-ENOMEM);
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 08/14] SUNRPC: add AF_VSOCK support to svc_xprt.c
  2017-06-30 13:23 [PATCH v3 00/14] NFS: add AF_VSOCK support Stefan Hajnoczi
                   ` (6 preceding siblings ...)
  2017-06-30 13:23 ` [PATCH v3 07/14] SUNRPC: drop unnecessary svc_bc_tcp_create() helper Stefan Hajnoczi
@ 2017-06-30 13:23 ` Stefan Hajnoczi
  2017-10-31 14:10   ` Jeff Layton
  2017-06-30 13:23 ` [PATCH v3 09/14] SUNRPC: add AF_VSOCK backchannel support Stefan Hajnoczi
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 35+ messages in thread
From: Stefan Hajnoczi @ 2017-06-30 13:23 UTC (permalink / raw)
  To: linux-nfs
  Cc: Abbas Naderi, Anna Schumaker, Trond Myklebust, J. Bruce Fields,
	Jeff Layton, Chuck Lever, Stefan Hajnoczi

Allow creation of AF_VSOCK service xprts.  This is needed for the
"vsock-bc" backchannel.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/linux/sunrpc/svc_xprt.h | 12 ++++++++++++
 net/sunrpc/svc_xprt.c           | 18 ++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index ddb7f94..938f9ee 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -8,6 +8,7 @@
 #define SUNRPC_SVC_XPRT_H
 
 #include <linux/sunrpc/svc.h>
+#include <linux/vm_sockets.h>
 
 struct module;
 
@@ -156,12 +157,15 @@ static inline unsigned short svc_addr_port(const struct sockaddr *sa)
 {
 	const struct sockaddr_in *sin = (const struct sockaddr_in *)sa;
 	const struct sockaddr_in6 *sin6 = (const struct sockaddr_in6 *)sa;
+	const struct sockaddr_vm *svm = (const struct sockaddr_vm *)sa;
 
 	switch (sa->sa_family) {
 	case AF_INET:
 		return ntohs(sin->sin_port);
 	case AF_INET6:
 		return ntohs(sin6->sin6_port);
+	case AF_VSOCK:
+		return svm->svm_port;
 	}
 
 	return 0;
@@ -174,6 +178,8 @@ static inline size_t svc_addr_len(const struct sockaddr *sa)
 		return sizeof(struct sockaddr_in);
 	case AF_INET6:
 		return sizeof(struct sockaddr_in6);
+	case AF_VSOCK:
+		return sizeof(struct sockaddr_vm);
 	}
 	BUG();
 }
@@ -193,6 +199,7 @@ static inline char *__svc_print_addr(const struct sockaddr *addr,
 {
 	const struct sockaddr_in *sin = (const struct sockaddr_in *)addr;
 	const struct sockaddr_in6 *sin6 = (const struct sockaddr_in6 *)addr;
+	const struct sockaddr_vm *svm = (const struct sockaddr_vm *)addr;
 
 	switch (addr->sa_family) {
 	case AF_INET:
@@ -206,6 +213,11 @@ static inline char *__svc_print_addr(const struct sockaddr *addr,
 			ntohs(sin6->sin6_port));
 		break;
 
+	case AF_VSOCK:
+		snprintf(buf, len, "%u, port=%u",
+			 svm->svm_cid, svm->svm_port);
+		break;
+
 	default:
 		snprintf(buf, len, "unknown address type: %d", addr->sa_family);
 		break;
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 7bfe1fb..2a8ccc1 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -10,6 +10,7 @@
 #include <linux/kthread.h>
 #include <linux/slab.h>
 #include <net/sock.h>
+#include <net/af_vsock.h>
 #include <linux/sunrpc/addr.h>
 #include <linux/sunrpc/stats.h>
 #include <linux/sunrpc/svc_xprt.h>
@@ -195,6 +196,13 @@ static struct svc_xprt *__svc_xpo_create(struct svc_xprt_class *xcl,
 		.sin6_port		= htons(port),
 	};
 #endif
+#ifdef CONFIG_SUNRPC_XPRT_VSOCK
+	struct sockaddr_vm svm = {
+		.svm_family		= AF_VSOCK,
+		.svm_cid		= VMADDR_CID_ANY,
+		.svm_port		= port,
+	};
+#endif
 	struct sockaddr *sap;
 	size_t len;
 
@@ -209,6 +217,12 @@ static struct svc_xprt *__svc_xpo_create(struct svc_xprt_class *xcl,
 		len = sizeof(sin6);
 		break;
 #endif
+#ifdef CONFIG_SUNRPC_XPRT_VSOCK
+	case AF_VSOCK:
+		sap = (struct sockaddr *)&svm;
+		len = sizeof(svm);
+		break;
+#endif
 	default:
 		return ERR_PTR(-EAFNOSUPPORT);
 	}
@@ -595,6 +609,10 @@ int svc_port_is_privileged(struct sockaddr *sin)
 	case AF_INET6:
 		return ntohs(((struct sockaddr_in6 *)sin)->sin6_port)
 			< PROT_SOCK;
+	case AF_VSOCK:
+		return ((struct sockaddr_vm *)sin)->svm_port <=
+			LAST_RESERVED_PORT;
+
 	default:
 		return 0;
 	}
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 09/14] SUNRPC: add AF_VSOCK backchannel support
  2017-06-30 13:23 [PATCH v3 00/14] NFS: add AF_VSOCK support Stefan Hajnoczi
                   ` (7 preceding siblings ...)
  2017-06-30 13:23 ` [PATCH v3 08/14] SUNRPC: add AF_VSOCK support to svc_xprt.c Stefan Hajnoczi
@ 2017-06-30 13:23 ` Stefan Hajnoczi
  2017-06-30 13:23 ` [PATCH v3 10/14] NFS: add AF_VSOCK support to NFS client Stefan Hajnoczi
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 35+ messages in thread
From: Stefan Hajnoczi @ 2017-06-30 13:23 UTC (permalink / raw)
  To: linux-nfs
  Cc: Abbas Naderi, Anna Schumaker, Trond Myklebust, J. Bruce Fields,
	Jeff Layton, Chuck Lever, Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 net/sunrpc/svcsock.c  | 27 +++++++++++++++++++++++++++
 net/sunrpc/xprtsock.c | 25 +++++++++++++++++++++++++
 2 files changed, 52 insertions(+)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 394e7a2..0c09f3f 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1236,14 +1236,41 @@ static struct svc_xprt_class svc_tcp_bc_class = {
 	.xcl_max_payload = RPCSVC_MAXPAYLOAD_TCP,
 };
 
+#ifdef CONFIG_SUNRPC_XPRT_VSOCK
+static void svc_bc_vsock_sock_detach(struct svc_xprt *xprt)
+{
+}
+
+static struct svc_xprt_ops svc_vsock_bc_ops = {
+	.xpo_create = svc_bc_create_socket,
+	.xpo_detach = svc_bc_vsock_sock_detach,
+	.xpo_free = svc_bc_sock_free,
+	.xpo_prep_reply_hdr = svc_tcp_prep_reply_hdr,
+	.xpo_secure_port = svc_sock_secure_port,
+};
+
+static struct svc_xprt_class svc_vsock_bc_class = {
+	.xcl_name = "vsock-bc",
+	.xcl_owner = THIS_MODULE,
+	.xcl_ops = &svc_vsock_bc_ops,
+	.xcl_max_payload = RPCSVC_MAXPAYLOAD,
+};
+#endif /* CONFIG_SUNRPC_XPRT_VSOCK */
+
 static void svc_init_bc_xprt_sock(void)
 {
 	svc_reg_xprt_class(&svc_tcp_bc_class);
+#ifdef CONFIG_SUNRPC_XPRT_VSOCK
+	svc_reg_xprt_class(&svc_vsock_bc_class);
+#endif
 }
 
 static void svc_cleanup_bc_xprt_sock(void)
 {
 	svc_unreg_xprt_class(&svc_tcp_bc_class);
+#ifdef CONFIG_SUNRPC_XPRT_VSOCK
+	svc_unreg_xprt_class(&svc_vsock_bc_class);
+#endif
 }
 #else /* CONFIG_SUNRPC_BACKCHANNEL */
 static void svc_init_bc_xprt_sock(void)
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index cc343b91..ccadb3a 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1430,6 +1430,24 @@ static size_t xs_tcp_bc_maxpayload(struct rpc_xprt *xprt)
 {
 	return PAGE_SIZE;
 }
+
+#ifdef CONFIG_SUNRPC_XPRT_VSOCK
+static int xs_vsock_bc_up(struct svc_serv *serv, struct net *net)
+{
+	int ret;
+
+	ret = svc_create_xprt(serv, "vsock-bc", net, AF_VSOCK, 0,
+			      SVC_SOCK_ANONYMOUS);
+	if (ret < 0)
+		return ret;
+	return 0;
+}
+
+static size_t xs_vsock_bc_maxpayload(struct rpc_xprt *xprt)
+{
+	return PAGE_SIZE;
+}
+#endif /* !CONFIG_SUNRPC_XPRT_VSOCK */
 #else
 static inline int _xs_stream_read_data(struct rpc_xprt *xprt,
 				       struct xdr_skb_reader *desc)
@@ -3494,6 +3512,13 @@ static struct rpc_xprt_ops xs_vsock_ops = {
 	.close			= xs_tcp_shutdown,
 	.destroy		= xs_destroy,
 	.print_stats		= xs_vsock_print_stats,
+#ifdef CONFIG_SUNRPC_BACKCHANNEL
+	.bc_setup		= xprt_setup_bc,
+	.bc_up			= xs_vsock_bc_up,
+	.bc_maxpayload		= xs_vsock_bc_maxpayload,
+	.bc_free_rqst		= xprt_free_bc_rqst,
+	.bc_destroy		= xprt_destroy_bc,
+#endif
 };
 
 static const struct rpc_timeout xs_vsock_default_timeout = {
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 10/14] NFS: add AF_VSOCK support to NFS client
  2017-06-30 13:23 [PATCH v3 00/14] NFS: add AF_VSOCK support Stefan Hajnoczi
                   ` (8 preceding siblings ...)
  2017-06-30 13:23 ` [PATCH v3 09/14] SUNRPC: add AF_VSOCK backchannel support Stefan Hajnoczi
@ 2017-06-30 13:23 ` Stefan Hajnoczi
  2017-06-30 13:23 ` [PATCH v3 11/14] nfsd: support vsock xprt creation Stefan Hajnoczi
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 35+ messages in thread
From: Stefan Hajnoczi @ 2017-06-30 13:23 UTC (permalink / raw)
  To: linux-nfs
  Cc: Abbas Naderi, Anna Schumaker, Trond Myklebust, J. Bruce Fields,
	Jeff Layton, Chuck Lever, Stefan Hajnoczi

This patch adds AF_VSOCK to the NFS client.  Mounts can now use the
"vsock" proto option and pass "vsock:<cid>" address strings, which are
interpreted by sunrpc for xprt creation.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 fs/nfs/client.c |  2 ++
 fs/nfs/super.c  | 11 ++++++++++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index ee5ddbd..80688a7 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -35,6 +35,7 @@
 #include <linux/vfs.h>
 #include <linux/inet.h>
 #include <linux/in6.h>
+#include <linux/vm_sockets.h>
 #include <linux/slab.h>
 #include <linux/idr.h>
 #include <net/ipv6.h>
@@ -447,6 +448,7 @@ void nfs_init_timeout_values(struct rpc_timeout *to, int proto,
 	switch (proto) {
 	case XPRT_TRANSPORT_TCP:
 	case XPRT_TRANSPORT_RDMA:
+	case XPRT_TRANSPORT_VSOCK:
 		if (retrans == NFS_UNSPEC_RETRANS)
 			to->to_retries = NFS_DEF_TCP_RETRANS;
 		if (timeo == NFS_UNSPEC_TIMEO || to->to_retries == 0)
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index eceb4ea..675404f 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -191,7 +191,7 @@ static const match_table_t nfs_mount_option_tokens = {
 
 enum {
 	Opt_xprt_udp, Opt_xprt_udp6, Opt_xprt_tcp, Opt_xprt_tcp6, Opt_xprt_rdma,
-	Opt_xprt_rdma6,
+	Opt_xprt_rdma6, Opt_xprt_vsock,
 
 	Opt_xprt_err
 };
@@ -203,6 +203,7 @@ static const match_table_t nfs_xprt_protocol_tokens = {
 	{ Opt_xprt_tcp6, "tcp6" },
 	{ Opt_xprt_rdma, "rdma" },
 	{ Opt_xprt_rdma6, "rdma6" },
+	{ Opt_xprt_vsock, "vsock" },
 
 	{ Opt_xprt_err, NULL }
 };
@@ -964,6 +965,8 @@ static int nfs_verify_server_address(struct sockaddr *addr)
 		struct in6_addr *sa = &((struct sockaddr_in6 *)addr)->sin6_addr;
 		return !ipv6_addr_any(sa);
 	}
+	case AF_VSOCK:
+		return 1;
 	}
 
 	dfprintk(MOUNT, "NFS: Invalid IP address specified\n");
@@ -993,6 +996,7 @@ static void nfs_validate_transport_protocol(struct nfs_parsed_mount_data *mnt)
 	case XPRT_TRANSPORT_UDP:
 	case XPRT_TRANSPORT_TCP:
 	case XPRT_TRANSPORT_RDMA:
+	case XPRT_TRANSPORT_VSOCK:
 		break;
 	default:
 		mnt->nfs_server.protocol = XPRT_TRANSPORT_TCP;
@@ -1474,6 +1478,11 @@ static int nfs_parse_mount_options(char *raw,
 				mnt->nfs_server.protocol = XPRT_TRANSPORT_RDMA;
 				xprt_load_transport(string);
 				break;
+			case Opt_xprt_vsock:
+				protofamily = AF_VSOCK;
+				mnt->flags &= ~NFS_MOUNT_TCP;
+				mnt->nfs_server.protocol = XPRT_TRANSPORT_VSOCK;
+				break;
 			default:
 				dfprintk(MOUNT, "NFS:   unrecognized "
 						"transport protocol\n");
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 11/14] nfsd: support vsock xprt creation
  2017-06-30 13:23 [PATCH v3 00/14] NFS: add AF_VSOCK support Stefan Hajnoczi
                   ` (9 preceding siblings ...)
  2017-06-30 13:23 ` [PATCH v3 10/14] NFS: add AF_VSOCK support to NFS client Stefan Hajnoczi
@ 2017-06-30 13:23 ` Stefan Hajnoczi
  2017-06-30 13:23 ` [PATCH v3 12/14] SUNRPC: add AF_VSOCK lock class Stefan Hajnoczi
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 35+ messages in thread
From: Stefan Hajnoczi @ 2017-06-30 13:23 UTC (permalink / raw)
  To: linux-nfs
  Cc: Abbas Naderi, Anna Schumaker, Trond Myklebust, J. Bruce Fields,
	Jeff Layton, Chuck Lever, Stefan Hajnoczi

nfs-utils will write the vsock address details to /proc/fs/nfsd/portlist
to start listening for NFS client connections.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 fs/nfsd/nfsctl.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 6493df6..1b71c28 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -768,15 +768,22 @@ static ssize_t __write_ports_addxprt(char *buf, struct net *net)
 	if (err != 0)
 		return err;
 
-	err = svc_create_xprt(nn->nfsd_serv, transport, net,
-				PF_INET, port, SVC_SOCK_ANONYMOUS);
-	if (err < 0)
-		goto out_err;
+	if (!strcmp(transport, "vsock")) {
+		err = svc_create_xprt(nn->nfsd_serv, transport, net,
+					AF_VSOCK, port, SVC_SOCK_ANONYMOUS);
+		if (err < 0)
+			goto out_err;
+	} else {
+		err = svc_create_xprt(nn->nfsd_serv, transport, net,
+					PF_INET, port, SVC_SOCK_ANONYMOUS);
+		if (err < 0)
+			goto out_err;
 
-	err = svc_create_xprt(nn->nfsd_serv, transport, net,
-				PF_INET6, port, SVC_SOCK_ANONYMOUS);
-	if (err < 0 && err != -EAFNOSUPPORT)
-		goto out_close;
+		err = svc_create_xprt(nn->nfsd_serv, transport, net,
+					PF_INET6, port, SVC_SOCK_ANONYMOUS);
+		if (err < 0 && err != -EAFNOSUPPORT)
+			goto out_close;
+	}
 
 	/* Decrease the count, but don't shut down the service */
 	nn->nfsd_serv->sv_nrthreads--;
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 12/14] SUNRPC: add AF_VSOCK lock class
  2017-06-30 13:23 [PATCH v3 00/14] NFS: add AF_VSOCK support Stefan Hajnoczi
                   ` (10 preceding siblings ...)
  2017-06-30 13:23 ` [PATCH v3 11/14] nfsd: support vsock xprt creation Stefan Hajnoczi
@ 2017-06-30 13:23 ` Stefan Hajnoczi
  2017-06-30 13:23 ` [PATCH v3 13/14] SUNRPC: vsock svcsock support Stefan Hajnoczi
  2017-06-30 13:23 ` [PATCH v3 14/14] SUNRPC: add AF_VSOCK support to auth.unix.ip Stefan Hajnoczi
  13 siblings, 0 replies; 35+ messages in thread
From: Stefan Hajnoczi @ 2017-06-30 13:23 UTC (permalink / raw)
  To: linux-nfs
  Cc: Abbas Naderi, Anna Schumaker, Trond Myklebust, J. Bruce Fields,
	Jeff Layton, Chuck Lever, Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 net/sunrpc/svcsock.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 0c09f3f..e67b097 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -78,8 +78,8 @@ static void svc_bc_sock_free(struct svc_xprt *xprt);
 #endif /* CONFIG_SUNRPC_BACKCHANNEL */
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-static struct lock_class_key svc_key[2];
-static struct lock_class_key svc_slock_key[2];
+static struct lock_class_key svc_key[3];
+static struct lock_class_key svc_slock_key[3];
 
 static void svc_reclassify_socket(struct socket *sock)
 {
@@ -103,6 +103,13 @@ static void svc_reclassify_socket(struct socket *sock)
 					      &svc_key[1]);
 		break;
 
+	case AF_VSOCK:
+		sock_lock_init_class_and_name(sk, "slock-AF_VSOCK-NFSD",
+					      &svc_slock_key[2],
+					      "sk_xprt.xpt_lock-AF_VSOCK-NFSD",
+					      &svc_key[2]);
+		break;
+
 	default:
 		BUG();
 	}
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 13/14] SUNRPC: vsock svcsock support
  2017-06-30 13:23 [PATCH v3 00/14] NFS: add AF_VSOCK support Stefan Hajnoczi
                   ` (11 preceding siblings ...)
  2017-06-30 13:23 ` [PATCH v3 12/14] SUNRPC: add AF_VSOCK lock class Stefan Hajnoczi
@ 2017-06-30 13:23 ` Stefan Hajnoczi
  2017-11-07 14:12   ` Jeff Layton
  2017-06-30 13:23 ` [PATCH v3 14/14] SUNRPC: add AF_VSOCK support to auth.unix.ip Stefan Hajnoczi
  13 siblings, 1 reply; 35+ messages in thread
From: Stefan Hajnoczi @ 2017-06-30 13:23 UTC (permalink / raw)
  To: linux-nfs
  Cc: Abbas Naderi, Anna Schumaker, Trond Myklebust, J. Bruce Fields,
	Jeff Layton, Chuck Lever, Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 net/sunrpc/svcsock.c | 214 +++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 190 insertions(+), 24 deletions(-)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index e67b097..86cce8f 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -42,7 +42,9 @@
 #include <net/udp.h>
 #include <net/tcp.h>
 #include <net/tcp_states.h>
+#include <net/af_vsock.h>
 #include <linux/uaccess.h>
+#include <asm/uaccess.h>
 #include <asm/ioctls.h>
 #include <trace/events/skb.h>
 
@@ -64,7 +66,7 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *, struct socket *,
 static int		svc_udp_recvfrom(struct svc_rqst *);
 static int		svc_udp_sendto(struct svc_rqst *);
 static void		svc_sock_detach(struct svc_xprt *);
-static void		svc_tcp_sock_detach(struct svc_xprt *);
+static void		svc_common_sock_detach(struct svc_xprt *);
 static void		svc_sock_free(struct svc_xprt *);
 
 static struct svc_xprt *svc_create_socket(struct svc_serv *, int,
@@ -810,9 +812,9 @@ static void svc_tcp_state_change(struct sock *sk)
 }
 
 /*
- * Accept a TCP connection
+ * Accept an incoming connection
  */
-static struct svc_xprt *svc_tcp_accept(struct svc_xprt *xprt)
+static struct svc_xprt *svc_common_accept(struct svc_xprt *xprt)
 {
 	struct svc_sock *svsk = container_of(xprt, struct svc_sock, sk_xprt);
 	struct sockaddr_storage addr;
@@ -824,7 +826,7 @@ static struct svc_xprt *svc_tcp_accept(struct svc_xprt *xprt)
 	int		err, slen;
 	RPC_IFDEBUG(char buf[RPC_MAX_ADDRBUFLEN]);
 
-	dprintk("svc: tcp_accept %p sock %p\n", svsk, sock);
+	dprintk("svc: %s %p sock %p\n", __func__, svsk, sock);
 	if (!sock)
 		return NULL;
 
@@ -877,7 +879,7 @@ static struct svc_xprt *svc_tcp_accept(struct svc_xprt *xprt)
 	svc_xprt_set_remote(&newsvsk->sk_xprt, sin, slen);
 	err = kernel_getsockname(newsock, sin, &slen);
 	if (unlikely(err < 0)) {
-		dprintk("svc_tcp_accept: kernel_getsockname error %d\n", -err);
+		dprintk("%s: kernel_getsockname error %d\n", __func__, -err);
 		slen = offsetof(struct sockaddr, sa_data);
 	}
 	svc_xprt_set_local(&newsvsk->sk_xprt, sin, slen);
@@ -1131,7 +1133,7 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp)
 		rqstp->rq_arg.page_len = rqstp->rq_arg.len - rqstp->rq_arg.head[0].iov_len;
 
 	rqstp->rq_xprt_ctxt   = NULL;
-	rqstp->rq_prot	      = IPPROTO_TCP;
+	rqstp->rq_prot	      = IPPROTO_TCP; /* TODO vsock should either use 0 or IPPROTO_MAX */
 	if (test_bit(XPT_LOCAL, &svsk->sk_xprt.xpt_flags))
 		set_bit(RQ_LOCAL, &rqstp->rq_flags);
 	else
@@ -1200,13 +1202,13 @@ static int svc_tcp_sendto(struct svc_rqst *rqstp)
 }
 
 /*
- * Setup response header. TCP has a 4B record length field.
+ * Setup response header. Byte stream transports have a 4B record length field.
  */
-static void svc_tcp_prep_reply_hdr(struct svc_rqst *rqstp)
+static void svc_stream_prep_reply_hdr(struct svc_rqst *rqstp)
 {
 	struct kvec *resv = &rqstp->rq_res.head[0];
 
-	/* tcp needs a space for the record length... */
+	/* space for the record length... */
 	svc_putnl(resv, 0);
 }
 
@@ -1232,7 +1234,7 @@ static struct svc_xprt_ops svc_tcp_bc_ops = {
 	.xpo_create = svc_bc_create_socket,
 	.xpo_detach = svc_bc_tcp_sock_detach,
 	.xpo_free = svc_bc_sock_free,
-	.xpo_prep_reply_hdr = svc_tcp_prep_reply_hdr,
+	.xpo_prep_reply_hdr = svc_stream_prep_reply_hdr,
 	.xpo_secure_port = svc_sock_secure_port,
 };
 
@@ -1244,6 +1246,145 @@ static struct svc_xprt_class svc_tcp_bc_class = {
 };
 
 #ifdef CONFIG_SUNRPC_XPRT_VSOCK
+/*
+ * A data_ready event on a listening socket means there's a connection
+ * pending. Do not use state_change as a substitute for it.
+ */
+static void svc_vsock_listen_data_ready(struct sock *sk)
+{
+	struct svc_sock	*svsk = (struct svc_sock *)sk->sk_user_data;
+	wait_queue_head_t *wq;
+
+	dprintk("svc: socket %p AF_VSOCK (listen) state change %d\n",
+		sk, sk->sk_state);
+
+	/*
+	 * This callback may called twice when a new connection
+	 * is established as a child socket inherits everything
+	 * from a parent VSOCK_SS_LISTEN socket.
+	 * 1) data_ready method of the parent socket will be called
+	 *    when one of child sockets become SS_CONNECTED.
+	 * 2) data_ready method of the child socket may be called
+	 *    when it receives data before the socket is accepted.
+	 * In case of 2, we should ignore it silently.
+	 */
+	if (sk->sk_state == VSOCK_SS_LISTEN) {
+		if (svsk) {
+			set_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags);
+			svc_xprt_enqueue(&svsk->sk_xprt);
+		} else
+			printk("svc: socket %p: no user data\n", sk);
+	}
+
+	wq = sk_sleep(sk);
+	if (wq && waitqueue_active(wq))
+		wake_up_interruptible_all(wq);
+}
+
+/*
+ * A state change on a connected socket means it's dying or dead.
+ */
+static void svc_vsock_state_change(struct sock *sk)
+{
+	struct svc_sock	*svsk = (struct svc_sock *)sk->sk_user_data;
+	wait_queue_head_t *wq = sk_sleep(sk);
+
+	dprintk("svc: socket %p AF_VSOCK (connected) state change %d (svsk %p)\n",
+		sk, sk->sk_state, sk->sk_user_data);
+
+	if (!svsk)
+		printk("svc: socket %p: no user data\n", sk);
+	else {
+		set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
+		svc_xprt_enqueue(&svsk->sk_xprt);
+	}
+	if (wq && waitqueue_active(wq))
+		wake_up_interruptible_all(wq);
+}
+
+static void svc_vsock_data_ready(struct sock *sk)
+{
+	struct svc_sock *svsk = (struct svc_sock *)sk->sk_user_data;
+	wait_queue_head_t *wq = sk_sleep(sk);
+
+	dprintk("svc: socket %p AF_VSOCK data ready (svsk %p)\n",
+		sk, sk->sk_user_data);
+	if (svsk) {
+		set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
+		svc_xprt_enqueue(&svsk->sk_xprt);
+	}
+	if (wq && waitqueue_active(wq))
+		wake_up_interruptible(wq);
+}
+
+static int svc_vsock_has_wspace(struct svc_xprt *xprt)
+{
+	struct svc_sock *svsk =	container_of(xprt, struct svc_sock, sk_xprt);
+
+	if (test_bit(XPT_LISTENER, &xprt->xpt_flags))
+		return 1;
+	return !test_bit(SOCK_NOSPACE, &svsk->sk_sock->flags);
+}
+
+static struct svc_xprt *svc_vsock_create(struct svc_serv *serv,
+					 struct net *net,
+					 struct sockaddr *sa, int salen,
+					 int flags)
+{
+	return svc_create_socket(serv, 0, net, sa, salen, flags);
+}
+
+static struct svc_xprt_ops svc_vsock_ops = {
+	.xpo_create = svc_vsock_create,
+	.xpo_recvfrom = svc_tcp_recvfrom,
+	.xpo_sendto = svc_tcp_sendto,
+	.xpo_release_rqst = svc_release_skb,
+	.xpo_detach = svc_common_sock_detach,
+	.xpo_free = svc_sock_free,
+	.xpo_prep_reply_hdr = svc_stream_prep_reply_hdr,
+	.xpo_has_wspace = svc_vsock_has_wspace,
+	.xpo_accept = svc_common_accept,
+	.xpo_secure_port = svc_sock_secure_port,
+};
+
+static struct svc_xprt_class svc_vsock_class = {
+	.xcl_name = "vsock",
+	.xcl_owner = THIS_MODULE,
+	.xcl_ops = &svc_vsock_ops,
+	.xcl_max_payload = RPCSVC_MAXPAYLOAD,
+	.xcl_ident = XPRT_TRANSPORT_VSOCK,
+};
+
+static void svc_vsock_init(struct svc_sock *svsk, struct svc_serv *serv)
+{
+	struct sock *sk = svsk->sk_sk;
+
+	svc_xprt_init(sock_net(svsk->sk_sock->sk), &svc_vsock_class,
+		      &svsk->sk_xprt, serv);
+	set_bit(XPT_CACHE_AUTH, &svsk->sk_xprt.xpt_flags);
+	set_bit(XPT_CONG_CTRL, &svsk->sk_xprt.xpt_flags);
+	if (sk->sk_state == VSOCK_SS_LISTEN) {
+		dprintk("setting up AF_VSOCK socket for listening\n");
+		set_bit(XPT_LISTENER, &svsk->sk_xprt.xpt_flags);
+		sk->sk_data_ready = svc_vsock_listen_data_ready;
+		set_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags);
+	} else {
+		dprintk("setting up AF_VSOCK socket for reading\n");
+		sk->sk_state_change = svc_vsock_state_change;
+		sk->sk_data_ready = svc_vsock_data_ready;
+		sk->sk_write_space = svc_write_space;
+
+		svsk->sk_reclen = 0;
+		svsk->sk_tcplen = 0;
+		svsk->sk_datalen = 0;
+		memset(&svsk->sk_pages[0], 0, sizeof(svsk->sk_pages));
+
+		set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
+		if (sk->sk_state != SS_CONNECTED)
+			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
+	}
+}
+
 static void svc_bc_vsock_sock_detach(struct svc_xprt *xprt)
 {
 }
@@ -1252,7 +1393,7 @@ static struct svc_xprt_ops svc_vsock_bc_ops = {
 	.xpo_create = svc_bc_create_socket,
 	.xpo_detach = svc_bc_vsock_sock_detach,
 	.xpo_free = svc_bc_sock_free,
-	.xpo_prep_reply_hdr = svc_tcp_prep_reply_hdr,
+	.xpo_prep_reply_hdr = svc_stream_prep_reply_hdr,
 	.xpo_secure_port = svc_sock_secure_port,
 };
 
@@ -1294,11 +1435,11 @@ static struct svc_xprt_ops svc_tcp_ops = {
 	.xpo_recvfrom = svc_tcp_recvfrom,
 	.xpo_sendto = svc_tcp_sendto,
 	.xpo_release_rqst = svc_release_skb,
-	.xpo_detach = svc_tcp_sock_detach,
+	.xpo_detach = svc_common_sock_detach,
 	.xpo_free = svc_sock_free,
-	.xpo_prep_reply_hdr = svc_tcp_prep_reply_hdr,
+	.xpo_prep_reply_hdr = svc_stream_prep_reply_hdr,
 	.xpo_has_wspace = svc_tcp_has_wspace,
-	.xpo_accept = svc_tcp_accept,
+	.xpo_accept = svc_common_accept,
 	.xpo_secure_port = svc_sock_secure_port,
 	.xpo_kill_temp_xprt = svc_tcp_kill_temp_xprt,
 };
@@ -1315,6 +1456,9 @@ void svc_init_xprt_sock(void)
 {
 	svc_reg_xprt_class(&svc_tcp_class);
 	svc_reg_xprt_class(&svc_udp_class);
+#ifdef CONFIG_SUNRPC_XPRT_VSOCK
+	svc_reg_xprt_class(&svc_vsock_class);
+#endif
 	svc_init_bc_xprt_sock();
 }
 
@@ -1322,6 +1466,9 @@ void svc_cleanup_xprt_sock(void)
 {
 	svc_unreg_xprt_class(&svc_tcp_class);
 	svc_unreg_xprt_class(&svc_udp_class);
+#ifdef CONFIG_SUNRPC_XPRT_VSOCK
+	svc_unreg_xprt_class(&svc_vsock_class);
+#endif
 	svc_cleanup_bc_xprt_sock();
 }
 
@@ -1417,6 +1564,10 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
 	/* Initialize the socket */
 	if (sock->type == SOCK_DGRAM)
 		svc_udp_init(svsk, serv);
+#ifdef CONFIG_SUNRPC_XPRT_VSOCK
+	else if (inet->sk_family == AF_VSOCK)
+		svc_vsock_init(svsk, serv);
+#endif
 	else
 		svc_tcp_init(svsk, serv);
 
@@ -1468,13 +1619,22 @@ int svc_addsock(struct svc_serv *serv, const int fd, char *name_return,
 
 	if (!so)
 		return err;
-	err = -EAFNOSUPPORT;
-	if ((so->sk->sk_family != PF_INET) && (so->sk->sk_family != PF_INET6))
-		goto out;
-	err =  -EPROTONOSUPPORT;
-	if (so->sk->sk_protocol != IPPROTO_TCP &&
-	    so->sk->sk_protocol != IPPROTO_UDP)
+	err = -EPROTONOSUPPORT;
+	switch (so->sk->sk_family) {
+	case PF_INET:
+	case PF_INET6:
+		if (so->sk->sk_protocol != IPPROTO_TCP &&
+		    so->sk->sk_protocol != IPPROTO_UDP)
+			goto out;
+		break;
+	case PF_VSOCK:
+		if (so->sk->sk_protocol != 0)
+			goto out;
+		break;
+	default:
+		err = -EAFNOSUPPORT;
 		goto out;
+	}
 	err = -EISCONN;
 	if (so->state > SS_UNCONNECTED)
 		goto out;
@@ -1521,7 +1681,8 @@ static struct svc_xprt *svc_create_socket(struct svc_serv *serv,
 			serv->sv_program->pg_name, protocol,
 			__svc_print_addr(sin, buf, sizeof(buf)));
 
-	if (protocol != IPPROTO_UDP && protocol != IPPROTO_TCP) {
+	if (sin->sa_family != AF_VSOCK &&
+	    protocol != IPPROTO_UDP && protocol != IPPROTO_TCP) {
 		printk(KERN_WARNING "svc: only UDP and TCP "
 				"sockets supported\n");
 		return ERR_PTR(-EINVAL);
@@ -1535,6 +1696,11 @@ static struct svc_xprt *svc_create_socket(struct svc_serv *serv,
 	case AF_INET:
 		family = PF_INET;
 		break;
+#ifdef CONFIG_SUNRPC_XPRT_VSOCK
+	case AF_VSOCK:
+		family = PF_VSOCK;
+		break;
+#endif
 	default:
 		return ERR_PTR(-EINVAL);
 	}
@@ -1566,7 +1732,7 @@ static struct svc_xprt *svc_create_socket(struct svc_serv *serv,
 	if (error < 0)
 		goto bummer;
 
-	if (protocol == IPPROTO_TCP) {
+	if (type == SOCK_STREAM) {
 		if ((error = kernel_listen(sock, 64)) < 0)
 			goto bummer;
 	}
@@ -1607,11 +1773,11 @@ static void svc_sock_detach(struct svc_xprt *xprt)
 /*
  * Disconnect the socket, and reset the callbacks
  */
-static void svc_tcp_sock_detach(struct svc_xprt *xprt)
+static void svc_common_sock_detach(struct svc_xprt *xprt)
 {
 	struct svc_sock *svsk = container_of(xprt, struct svc_sock, sk_xprt);
 
-	dprintk("svc: svc_tcp_sock_detach(%p)\n", svsk);
+	dprintk("svc: %s(%p)\n", __func__, svsk);
 
 	svc_sock_detach(xprt);
 
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 14/14] SUNRPC: add AF_VSOCK support to auth.unix.ip
  2017-06-30 13:23 [PATCH v3 00/14] NFS: add AF_VSOCK support Stefan Hajnoczi
                   ` (12 preceding siblings ...)
  2017-06-30 13:23 ` [PATCH v3 13/14] SUNRPC: vsock svcsock support Stefan Hajnoczi
@ 2017-06-30 13:23 ` Stefan Hajnoczi
  2017-07-06 18:46   ` Abbas Naderi
  13 siblings, 1 reply; 35+ messages in thread
From: Stefan Hajnoczi @ 2017-06-30 13:23 UTC (permalink / raw)
  To: linux-nfs
  Cc: Abbas Naderi, Anna Schumaker, Trond Myklebust, J. Bruce Fields,
	Jeff Layton, Chuck Lever, Stefan Hajnoczi

The ip_map currently supports AF_INET and AF_INET6.  It actually
converts IPv4 to IPv6 addresses.  We can't do that for AF_VSOCK but a
union will allow both IPv6 and vsock sockaddr structs to be used.

The cache userspace interface now supports 'vsock:CID' syntax for
AF_VSOCK addresses.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 net/sunrpc/svcauth_unix.c | 146 ++++++++++++++++++++++++++++++++++++----------
 1 file changed, 115 insertions(+), 31 deletions(-)

diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
index f81eaa8..b33ea0b 100644
--- a/net/sunrpc/svcauth_unix.c
+++ b/net/sunrpc/svcauth_unix.c
@@ -89,7 +89,15 @@ EXPORT_SYMBOL_GPL(unix_domain_find);
 struct ip_map {
 	struct cache_head	h;
 	char			m_class[8]; /* e.g. "nfsd" */
-	struct in6_addr		m_addr;
+	union {
+		struct sockaddr		m_sa;
+
+		/* For AF_INET6 and AF_INET (we map to IPv6) */
+		struct sockaddr_in6	m_sin6;
+
+		/* For AF_VSOCK */
+		struct sockaddr_vm	m_svm;
+	};
 	struct unix_domain	*m_client;
 };
 
@@ -112,8 +120,22 @@ static int ip_map_match(struct cache_head *corig, struct cache_head *cnew)
 {
 	struct ip_map *orig = container_of(corig, struct ip_map, h);
 	struct ip_map *new = container_of(cnew, struct ip_map, h);
-	return strcmp(orig->m_class, new->m_class) == 0 &&
-	       ipv6_addr_equal(&orig->m_addr, &new->m_addr);
+
+	if (strcmp(orig->m_class, new->m_class) != 0)
+		return 0;
+
+	if (orig->m_sa.sa_family != new->m_sa.sa_family)
+		return 0;
+
+	switch (orig->m_sa.sa_family) {
+	case AF_INET6:
+		return ipv6_addr_equal(&orig->m_sin6.sin6_addr,
+				       &new->m_sin6.sin6_addr);
+
+	case AF_VSOCK:
+		return orig->m_svm.svm_cid == new->m_svm.svm_cid;
+	}
+	return 0;
 }
 static void ip_map_init(struct cache_head *cnew, struct cache_head *citem)
 {
@@ -121,7 +143,14 @@ static void ip_map_init(struct cache_head *cnew, struct cache_head *citem)
 	struct ip_map *item = container_of(citem, struct ip_map, h);
 
 	strcpy(new->m_class, item->m_class);
-	new->m_addr = item->m_addr;
+	switch (item->m_sa.sa_family) {
+	case AF_INET6:
+		new->m_sin6 = item->m_sin6;
+		break;
+	case AF_VSOCK:
+		new->m_svm = item->m_svm;
+		break;
+	}
 }
 static void update(struct cache_head *cnew, struct cache_head *citem)
 {
@@ -145,19 +174,30 @@ static void ip_map_request(struct cache_detail *cd,
 				  char **bpp, int *blen)
 {
 	char text_addr[40];
+	struct in6_addr *addr;
 	struct ip_map *im = container_of(h, struct ip_map, h);
 
-	if (ipv6_addr_v4mapped(&(im->m_addr))) {
-		snprintf(text_addr, 20, "%pI4", &im->m_addr.s6_addr32[3]);
-	} else {
-		snprintf(text_addr, 40, "%pI6", &im->m_addr);
+	switch (im->m_sa.sa_family) {
+	case AF_INET6:
+		addr = &im->m_sin6.sin6_addr;
+		if (ipv6_addr_v4mapped(addr)) {
+			snprintf(text_addr, 20, "%pI4", &addr->s6_addr32[3]);
+		} else {
+			snprintf(text_addr, 40, "%pI6", addr);
+		}
+		break;
+
+	case AF_VSOCK:
+		snprintf(text_addr, 10, "vsock:%u", im->m_svm.svm_cid);
+		break;
 	}
+
 	qword_add(bpp, blen, im->m_class);
 	qword_add(bpp, blen, text_addr);
 	(*bpp)[-1] = '\n';
 }
 
-static struct ip_map *__ip_map_lookup(struct cache_detail *cd, char *class, struct in6_addr *addr);
+static struct ip_map *__ip_map_lookup(struct cache_detail *cd, char *class, struct sockaddr *sap);
 static int __ip_map_update(struct cache_detail *cd, struct ip_map *ipm, struct unix_domain *udom, time_t expiry);
 
 static int ip_map_parse(struct cache_detail *cd,
@@ -173,6 +213,7 @@ static int ip_map_parse(struct cache_detail *cd,
 		struct sockaddr		sa;
 		struct sockaddr_in	s4;
 		struct sockaddr_in6	s6;
+		struct sockaddr_vm	svm;
 	} address;
 	struct sockaddr_in6 sin6;
 	int err;
@@ -201,11 +242,15 @@ static int ip_map_parse(struct cache_detail *cd,
 		sin6.sin6_family = AF_INET6;
 		ipv6_addr_set_v4mapped(address.s4.sin_addr.s_addr,
 				&sin6.sin6_addr);
+		address.s6 = sin6;
 		break;
 #if IS_ENABLED(CONFIG_IPV6)
 	case AF_INET6:
-		memcpy(&sin6, &address.s6, sizeof(sin6));
-		break;
+		break; /* Do nothing */
+#endif
+#ifdef CONFIG_SUNRPC_XPRT_VSOCK
+	case AF_VSOCK:
+		break; /* Do nothing */
 #endif
 	default:
 		return -EINVAL;
@@ -227,7 +272,7 @@ static int ip_map_parse(struct cache_detail *cd,
 		dom = NULL;
 
 	/* IPv6 scope IDs are ignored for now */
-	ipmp = __ip_map_lookup(cd, class, &sin6.sin6_addr);
+	ipmp = __ip_map_lookup(cd, class, &address.sa);
 	if (ipmp) {
 		err = __ip_map_update(cd, ipmp,
 			     container_of(dom, struct unix_domain, h),
@@ -247,7 +292,7 @@ static int ip_map_show(struct seq_file *m,
 		       struct cache_head *h)
 {
 	struct ip_map *im;
-	struct in6_addr addr;
+	struct in6_addr *addr;
 	char *dom = "-no-domain-";
 
 	if (h == NULL) {
@@ -256,33 +301,67 @@ static int ip_map_show(struct seq_file *m,
 	}
 	im = container_of(h, struct ip_map, h);
 	/* class addr domain */
-	addr = im->m_addr;
-
+	addr = &im->m_sin6.sin6_addr;
 	if (test_bit(CACHE_VALID, &h->flags) &&
 	    !test_bit(CACHE_NEGATIVE, &h->flags))
 		dom = im->m_client->h.name;
 
-	if (ipv6_addr_v4mapped(&addr)) {
-		seq_printf(m, "%s %pI4 %s\n",
-			im->m_class, &addr.s6_addr32[3], dom);
-	} else {
-		seq_printf(m, "%s %pI6 %s\n", im->m_class, &addr, dom);
+	switch (im->m_sa.sa_family) {
+	case AF_INET6:
+		if (ipv6_addr_v4mapped(addr)) {
+			seq_printf(m, "%s %pI4 %s\n",
+				im->m_class,
+				&addr->s6_addr32[3],
+				dom);
+		} else {
+			seq_printf(m, "%s %pI6 %s\n", im->m_class, addr, dom);
+		}
+		break;
+
+	case AF_VSOCK:
+		seq_printf(m, "%s %u %s\n",
+			im->m_class, im->m_svm.svm_cid, dom);
+		break;
 	}
 	return 0;
 }
 
+static int __ip_map_hash(struct ip_map *ipm)
+{
+	int hash;
+
+	switch (ipm->m_sa.sa_family) {
+	case AF_INET6:
+		hash = hash_ip6(&ipm->m_sin6.sin6_addr);
+		break;
+	case AF_VSOCK:
+		hash = hash_32(ipm->m_svm.svm_cid, IP_HASHBITS);
+		break;
+	default:
+		BUG();
+	}
+
+	return hash_str(ipm->m_class, IP_HASHBITS) ^ hash;
+}
 
 static struct ip_map *__ip_map_lookup(struct cache_detail *cd, char *class,
-		struct in6_addr *addr)
+		struct sockaddr *sap)
 {
 	struct ip_map ip;
 	struct cache_head *ch;
 
 	strcpy(ip.m_class, class);
-	ip.m_addr = *addr;
-	ch = sunrpc_cache_lookup(cd, &ip.h,
-				 hash_str(class, IP_HASHBITS) ^
-				 hash_ip6(addr));
+	switch (sap->sa_family) {
+	case AF_INET6:
+		ip.m_sin6 = *(struct sockaddr_in6 *)sap;
+		break;
+	case AF_VSOCK:
+		ip.m_svm = *(struct sockaddr_vm *)sap;
+		break;
+	default:
+		return NULL;
+	}
+	ch = sunrpc_cache_lookup(cd, &ip.h, __ip_map_hash(&ip));
 
 	if (ch)
 		return container_of(ch, struct ip_map, h);
@@ -291,12 +370,12 @@ static struct ip_map *__ip_map_lookup(struct cache_detail *cd, char *class,
 }
 
 static inline struct ip_map *ip_map_lookup(struct net *net, char *class,
-		struct in6_addr *addr)
+		struct sockaddr *sap)
 {
 	struct sunrpc_net *sn;
 
 	sn = net_generic(net, sunrpc_net_id);
-	return __ip_map_lookup(sn->ip_map_cache, class, addr);
+	return __ip_map_lookup(sn->ip_map_cache, class, sap);
 }
 
 static int __ip_map_update(struct cache_detail *cd, struct ip_map *ipm,
@@ -311,8 +390,7 @@ static int __ip_map_update(struct cache_detail *cd, struct ip_map *ipm,
 		set_bit(CACHE_NEGATIVE, &ip.h.flags);
 	ip.h.expiry_time = expiry;
 	ch = sunrpc_cache_update(cd, &ip.h, &ipm->h,
-				 hash_str(ipm->m_class, IP_HASHBITS) ^
-				 hash_ip6(&ipm->m_addr));
+				 __ip_map_hash(ipm));
 	if (!ch)
 		return -ENOMEM;
 	cache_put(ch, cd);
@@ -654,6 +732,7 @@ static struct group_info *unix_gid_find(kuid_t uid, struct svc_rqst *rqstp)
 int
 svcauth_unix_set_client(struct svc_rqst *rqstp)
 {
+	struct sockaddr *sap;
 	struct sockaddr_in *sin;
 	struct sockaddr_in6 *sin6, sin6_storage;
 	struct ip_map *ipm;
@@ -667,10 +746,15 @@ svcauth_unix_set_client(struct svc_rqst *rqstp)
 	case AF_INET:
 		sin = svc_addr_in(rqstp);
 		sin6 = &sin6_storage;
+		sin6->sin6_family = AF_INET6;
 		ipv6_addr_set_v4mapped(sin->sin_addr.s_addr, &sin6->sin6_addr);
+		sap = (struct sockaddr *)sin6;
 		break;
 	case AF_INET6:
-		sin6 = svc_addr_in6(rqstp);
+		sap = svc_addr(rqstp);
+		break;
+	case AF_VSOCK:
+		sap = svc_addr(rqstp);
 		break;
 	default:
 		BUG();
@@ -683,7 +767,7 @@ svcauth_unix_set_client(struct svc_rqst *rqstp)
 	ipm = ip_map_cached_get(xprt);
 	if (ipm == NULL)
 		ipm = __ip_map_lookup(sn->ip_map_cache, rqstp->rq_server->sv_program->pg_class,
-				    &sin6->sin6_addr);
+				    sap);
 
 	if (ipm == NULL)
 		return SVC_DENIED;
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 14/14] SUNRPC: add AF_VSOCK support to auth.unix.ip
  2017-06-30 13:23 ` [PATCH v3 14/14] SUNRPC: add AF_VSOCK support to auth.unix.ip Stefan Hajnoczi
@ 2017-07-06 18:46   ` Abbas Naderi
  2017-07-10 18:05     ` Stefan Hajnoczi
  0 siblings, 1 reply; 35+ messages in thread
From: Abbas Naderi @ 2017-07-06 18:46 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: linux-nfs, Anna Schumaker, Trond Myklebust, J. Bruce Fields,
	Jeff Layton, Chuck Lever

Without CONFIG_SUNRPC_BACKCHANNEL, many things such as =
=E2=80=9Csvc_vsock_bc_class=E2=80=9D are undefined, and then used =
throughout the code.
Most of the =E2=80=9CCONFIG_SUNRPC_XPRT_VSOCK=E2=80=9D guards in =
sunrpc/svcsock.c should also be guarded by CONFIG_SUNRPC_BACKCHANNEL.

-A

> On Jun 30, 2017, at 6:23 AM, Stefan Hajnoczi <stefanha@redhat.com> =
wrote:
>=20
> The ip_map currently supports AF_INET and AF_INET6.  It actually
> converts IPv4 to IPv6 addresses.  We can't do that for AF_VSOCK but a
> union will allow both IPv6 and vsock sockaddr structs to be used.
>=20
> The cache userspace interface now supports 'vsock:CID' syntax for
> AF_VSOCK addresses.
>=20
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> net/sunrpc/svcauth_unix.c | 146 =
++++++++++++++++++++++++++++++++++++----------
> 1 file changed, 115 insertions(+), 31 deletions(-)
>=20
> diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
> index f81eaa8..b33ea0b 100644
> --- a/net/sunrpc/svcauth_unix.c
> +++ b/net/sunrpc/svcauth_unix.c
> @@ -89,7 +89,15 @@ EXPORT_SYMBOL_GPL(unix_domain_find);
> struct ip_map {
> 	struct cache_head	h;
> 	char			m_class[8]; /* e.g. "nfsd" */
> -	struct in6_addr		m_addr;
> +	union {
> +		struct sockaddr		m_sa;
> +
> +		/* For AF_INET6 and AF_INET (we map to IPv6) */
> +		struct sockaddr_in6	m_sin6;
> +
> +		/* For AF_VSOCK */
> +		struct sockaddr_vm	m_svm;
> +	};
> 	struct unix_domain	*m_client;
> };
>=20
> @@ -112,8 +120,22 @@ static int ip_map_match(struct cache_head *corig, =
struct cache_head *cnew)
> {
> 	struct ip_map *orig =3D container_of(corig, struct ip_map, h);
> 	struct ip_map *new =3D container_of(cnew, struct ip_map, h);
> -	return strcmp(orig->m_class, new->m_class) =3D=3D 0 &&
> -	       ipv6_addr_equal(&orig->m_addr, &new->m_addr);
> +
> +	if (strcmp(orig->m_class, new->m_class) !=3D 0)
> +		return 0;
> +
> +	if (orig->m_sa.sa_family !=3D new->m_sa.sa_family)
> +		return 0;
> +
> +	switch (orig->m_sa.sa_family) {
> +	case AF_INET6:
> +		return ipv6_addr_equal(&orig->m_sin6.sin6_addr,
> +				       &new->m_sin6.sin6_addr);
> +
> +	case AF_VSOCK:
> +		return orig->m_svm.svm_cid =3D=3D new->m_svm.svm_cid;
> +	}
> +	return 0;
> }
> static void ip_map_init(struct cache_head *cnew, struct cache_head =
*citem)
> {
> @@ -121,7 +143,14 @@ static void ip_map_init(struct cache_head *cnew, =
struct cache_head *citem)
> 	struct ip_map *item =3D container_of(citem, struct ip_map, h);
>=20
> 	strcpy(new->m_class, item->m_class);
> -	new->m_addr =3D item->m_addr;
> +	switch (item->m_sa.sa_family) {
> +	case AF_INET6:
> +		new->m_sin6 =3D item->m_sin6;
> +		break;
> +	case AF_VSOCK:
> +		new->m_svm =3D item->m_svm;
> +		break;
> +	}
> }
> static void update(struct cache_head *cnew, struct cache_head *citem)
> {
> @@ -145,19 +174,30 @@ static void ip_map_request(struct cache_detail =
*cd,
> 				  char **bpp, int *blen)
> {
> 	char text_addr[40];
> +	struct in6_addr *addr;
> 	struct ip_map *im =3D container_of(h, struct ip_map, h);
>=20
> -	if (ipv6_addr_v4mapped(&(im->m_addr))) {
> -		snprintf(text_addr, 20, "%pI4", =
&im->m_addr.s6_addr32[3]);
> -	} else {
> -		snprintf(text_addr, 40, "%pI6", &im->m_addr);
> +	switch (im->m_sa.sa_family) {
> +	case AF_INET6:
> +		addr =3D &im->m_sin6.sin6_addr;
> +		if (ipv6_addr_v4mapped(addr)) {
> +			snprintf(text_addr, 20, "%pI4", =
&addr->s6_addr32[3]);
> +		} else {
> +			snprintf(text_addr, 40, "%pI6", addr);
> +		}
> +		break;
> +
> +	case AF_VSOCK:
> +		snprintf(text_addr, 10, "vsock:%u", im->m_svm.svm_cid);
> +		break;
> 	}
> +
> 	qword_add(bpp, blen, im->m_class);
> 	qword_add(bpp, blen, text_addr);
> 	(*bpp)[-1] =3D '\n';
> }
>=20
> -static struct ip_map *__ip_map_lookup(struct cache_detail *cd, char =
*class, struct in6_addr *addr);
> +static struct ip_map *__ip_map_lookup(struct cache_detail *cd, char =
*class, struct sockaddr *sap);
> static int __ip_map_update(struct cache_detail *cd, struct ip_map =
*ipm, struct unix_domain *udom, time_t expiry);
>=20
> static int ip_map_parse(struct cache_detail *cd,
> @@ -173,6 +213,7 @@ static int ip_map_parse(struct cache_detail *cd,
> 		struct sockaddr		sa;
> 		struct sockaddr_in	s4;
> 		struct sockaddr_in6	s6;
> +		struct sockaddr_vm	svm;
> 	} address;
> 	struct sockaddr_in6 sin6;
> 	int err;
> @@ -201,11 +242,15 @@ static int ip_map_parse(struct cache_detail *cd,
> 		sin6.sin6_family =3D AF_INET6;
> 		ipv6_addr_set_v4mapped(address.s4.sin_addr.s_addr,
> 				&sin6.sin6_addr);
> +		address.s6 =3D sin6;
> 		break;
> #if IS_ENABLED(CONFIG_IPV6)
> 	case AF_INET6:
> -		memcpy(&sin6, &address.s6, sizeof(sin6));
> -		break;
> +		break; /* Do nothing */
> +#endif
> +#ifdef CONFIG_SUNRPC_XPRT_VSOCK
> +	case AF_VSOCK:
> +		break; /* Do nothing */
> #endif
> 	default:
> 		return -EINVAL;
> @@ -227,7 +272,7 @@ static int ip_map_parse(struct cache_detail *cd,
> 		dom =3D NULL;
>=20
> 	/* IPv6 scope IDs are ignored for now */
> -	ipmp =3D __ip_map_lookup(cd, class, &sin6.sin6_addr);
> +	ipmp =3D __ip_map_lookup(cd, class, &address.sa);
> 	if (ipmp) {
> 		err =3D __ip_map_update(cd, ipmp,
> 			     container_of(dom, struct unix_domain, h),
> @@ -247,7 +292,7 @@ static int ip_map_show(struct seq_file *m,
> 		       struct cache_head *h)
> {
> 	struct ip_map *im;
> -	struct in6_addr addr;
> +	struct in6_addr *addr;
> 	char *dom =3D "-no-domain-";
>=20
> 	if (h =3D=3D NULL) {
> @@ -256,33 +301,67 @@ static int ip_map_show(struct seq_file *m,
> 	}
> 	im =3D container_of(h, struct ip_map, h);
> 	/* class addr domain */
> -	addr =3D im->m_addr;
> -
> +	addr =3D &im->m_sin6.sin6_addr;
> 	if (test_bit(CACHE_VALID, &h->flags) &&
> 	    !test_bit(CACHE_NEGATIVE, &h->flags))
> 		dom =3D im->m_client->h.name;
>=20
> -	if (ipv6_addr_v4mapped(&addr)) {
> -		seq_printf(m, "%s %pI4 %s\n",
> -			im->m_class, &addr.s6_addr32[3], dom);
> -	} else {
> -		seq_printf(m, "%s %pI6 %s\n", im->m_class, &addr, dom);
> +	switch (im->m_sa.sa_family) {
> +	case AF_INET6:
> +		if (ipv6_addr_v4mapped(addr)) {
> +			seq_printf(m, "%s %pI4 %s\n",
> +				im->m_class,
> +				&addr->s6_addr32[3],
> +				dom);
> +		} else {
> +			seq_printf(m, "%s %pI6 %s\n", im->m_class, addr, =
dom);
> +		}
> +		break;
> +
> +	case AF_VSOCK:
> +		seq_printf(m, "%s %u %s\n",
> +			im->m_class, im->m_svm.svm_cid, dom);
> +		break;
> 	}
> 	return 0;
> }
>=20
> +static int __ip_map_hash(struct ip_map *ipm)
> +{
> +	int hash;
> +
> +	switch (ipm->m_sa.sa_family) {
> +	case AF_INET6:
> +		hash =3D hash_ip6(&ipm->m_sin6.sin6_addr);
> +		break;
> +	case AF_VSOCK:
> +		hash =3D hash_32(ipm->m_svm.svm_cid, IP_HASHBITS);
> +		break;
> +	default:
> +		BUG();
> +	}
> +
> +	return hash_str(ipm->m_class, IP_HASHBITS) ^ hash;
> +}
>=20
> static struct ip_map *__ip_map_lookup(struct cache_detail *cd, char =
*class,
> -		struct in6_addr *addr)
> +		struct sockaddr *sap)
> {
> 	struct ip_map ip;
> 	struct cache_head *ch;
>=20
> 	strcpy(ip.m_class, class);
> -	ip.m_addr =3D *addr;
> -	ch =3D sunrpc_cache_lookup(cd, &ip.h,
> -				 hash_str(class, IP_HASHBITS) ^
> -				 hash_ip6(addr));
> +	switch (sap->sa_family) {
> +	case AF_INET6:
> +		ip.m_sin6 =3D *(struct sockaddr_in6 *)sap;
> +		break;
> +	case AF_VSOCK:
> +		ip.m_svm =3D *(struct sockaddr_vm *)sap;
> +		break;
> +	default:
> +		return NULL;
> +	}
> +	ch =3D sunrpc_cache_lookup(cd, &ip.h, __ip_map_hash(&ip));
>=20
> 	if (ch)
> 		return container_of(ch, struct ip_map, h);
> @@ -291,12 +370,12 @@ static struct ip_map *__ip_map_lookup(struct =
cache_detail *cd, char *class,
> }
>=20
> static inline struct ip_map *ip_map_lookup(struct net *net, char =
*class,
> -		struct in6_addr *addr)
> +		struct sockaddr *sap)
> {
> 	struct sunrpc_net *sn;
>=20
> 	sn =3D net_generic(net, sunrpc_net_id);
> -	return __ip_map_lookup(sn->ip_map_cache, class, addr);
> +	return __ip_map_lookup(sn->ip_map_cache, class, sap);
> }
>=20
> static int __ip_map_update(struct cache_detail *cd, struct ip_map =
*ipm,
> @@ -311,8 +390,7 @@ static int __ip_map_update(struct cache_detail =
*cd, struct ip_map *ipm,
> 		set_bit(CACHE_NEGATIVE, &ip.h.flags);
> 	ip.h.expiry_time =3D expiry;
> 	ch =3D sunrpc_cache_update(cd, &ip.h, &ipm->h,
> -				 hash_str(ipm->m_class, IP_HASHBITS) ^
> -				 hash_ip6(&ipm->m_addr));
> +				 __ip_map_hash(ipm));
> 	if (!ch)
> 		return -ENOMEM;
> 	cache_put(ch, cd);
> @@ -654,6 +732,7 @@ static struct group_info *unix_gid_find(kuid_t =
uid, struct svc_rqst *rqstp)
> int
> svcauth_unix_set_client(struct svc_rqst *rqstp)
> {
> +	struct sockaddr *sap;
> 	struct sockaddr_in *sin;
> 	struct sockaddr_in6 *sin6, sin6_storage;
> 	struct ip_map *ipm;
> @@ -667,10 +746,15 @@ svcauth_unix_set_client(struct svc_rqst *rqstp)
> 	case AF_INET:
> 		sin =3D svc_addr_in(rqstp);
> 		sin6 =3D &sin6_storage;
> +		sin6->sin6_family =3D AF_INET6;
> 		ipv6_addr_set_v4mapped(sin->sin_addr.s_addr, =
&sin6->sin6_addr);
> +		sap =3D (struct sockaddr *)sin6;
> 		break;
> 	case AF_INET6:
> -		sin6 =3D svc_addr_in6(rqstp);
> +		sap =3D svc_addr(rqstp);
> +		break;
> +	case AF_VSOCK:
> +		sap =3D svc_addr(rqstp);
> 		break;
> 	default:
> 		BUG();
> @@ -683,7 +767,7 @@ svcauth_unix_set_client(struct svc_rqst *rqstp)
> 	ipm =3D ip_map_cached_get(xprt);
> 	if (ipm =3D=3D NULL)
> 		ipm =3D __ip_map_lookup(sn->ip_map_cache, =
rqstp->rq_server->sv_program->pg_class,
> -				    &sin6->sin6_addr);
> +				    sap);
>=20
> 	if (ipm =3D=3D NULL)
> 		return SVC_DENIED;
> --=20
> 2.9.4
>=20


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 14/14] SUNRPC: add AF_VSOCK support to auth.unix.ip
  2017-07-06 18:46   ` Abbas Naderi
@ 2017-07-10 18:05     ` Stefan Hajnoczi
  0 siblings, 0 replies; 35+ messages in thread
From: Stefan Hajnoczi @ 2017-07-10 18:05 UTC (permalink / raw)
  To: Abbas Naderi
  Cc: linux-nfs, Anna Schumaker, Trond Myklebust, J. Bruce Fields,
	Jeff Layton, Chuck Lever

[-- Attachment #1: Type: text/plain, Size: 367 bytes --]

On Thu, Jul 06, 2017 at 11:46:05AM -0700, Abbas Naderi wrote:
> Without CONFIG_SUNRPC_BACKCHANNEL, many things such as “svc_vsock_bc_class” are undefined, and then used throughout the code.
> Most of the “CONFIG_SUNRPC_XPRT_VSOCK” guards in sunrpc/svcsock.c should also be guarded by CONFIG_SUNRPC_BACKCHANNEL.

Thanks, Abbas!  Will fix in v4.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 05/14] VSOCK: add tcp_read_sock()-like vsock_read_sock() function
  2017-06-30 13:23 ` [PATCH v3 05/14] VSOCK: add tcp_read_sock()-like vsock_read_sock() function Stefan Hajnoczi
@ 2017-10-31 13:35   ` Jeff Layton
  2017-11-07 13:32     ` Stefan Hajnoczi
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff Layton @ 2017-10-31 13:35 UTC (permalink / raw)
  To: Stefan Hajnoczi, linux-nfs
  Cc: Abbas Naderi, Anna Schumaker, Trond Myklebust, J. Bruce Fields,
	Chuck Lever

On Fri, 2017-06-30 at 14:23 +0100, Stefan Hajnoczi wrote:
> The tcp_read_sock() interface dequeues skbs and gives them to the
> caller's callback function for processing.  This interface can avoid
> data copies since the caller accesses the skb instead of using its own
> receive buffer.
> 
> This patch implements vsock_read_sock() for AF_VSOCK SOCK_STREAM
> sockets.  The implementation is only for virtio-vsock at this time, not
> for the VMware VMCI transport.  It is not zero-copy yet because the
> virtio-vsock receive queue does not consist of skbs.
> 
> The tcp_read_sock()-like interface is needed for AF_VSOCK sunrpc
> support.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/linux/virtio_vsock.h            |  4 ++
>  include/net/af_vsock.h                  |  5 +++
>  drivers/vhost/vsock.c                   |  1 +
>  net/vmw_vsock/af_vsock.c                | 16 ++++++++
>  net/vmw_vsock/virtio_transport.c        |  1 +
>  net/vmw_vsock/virtio_transport_common.c | 66 +++++++++++++++++++++++++++++++++
>  net/vmw_vsock/vmci_transport.c          |  8 ++++
>  7 files changed, 101 insertions(+)
> 
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index ab13f07..d6cf428 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -5,6 +5,7 @@
>  #include <linux/socket.h>
>  #include <net/sock.h>
>  #include <net/af_vsock.h>
> +#include <net/tcp.h> /* for sk_read_actor_t */
>  
>  #define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE	128
>  #define VIRTIO_VSOCK_DEFAULT_BUF_SIZE		(1024 * 256)
> @@ -126,6 +127,9 @@ int virtio_transport_notify_send_post_enqueue(struct vsock_sock *vsk,
>  u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
>  bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
>  bool virtio_transport_stream_allow(u32 cid, u32 port);
> +int virtio_transport_stream_read_sock(struct vsock_sock *vsk,
> +				      read_descriptor_t *desc,
> +				      sk_read_actor_t recv_actor);
>  int virtio_transport_dgram_bind(struct vsock_sock *vsk,
>  				struct sockaddr_vm *addr);
>  bool virtio_transport_dgram_allow(u32 cid, u32 port);
> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> index f9fb566..85fa345 100644
> --- a/include/net/af_vsock.h
> +++ b/include/net/af_vsock.h
> @@ -19,6 +19,7 @@
>  #include <linux/kernel.h>
>  #include <linux/workqueue.h>
>  #include <linux/vm_sockets.h>
> +#include <net/tcp.h> /* for sk_read_actor_t */
>  
>  #include "vsock_addr.h"
>  
> @@ -73,6 +74,8 @@ struct vsock_sock {
>  	void *trans;
>  };
>  
> +int vsock_read_sock(struct sock *sk, read_descriptor_t *desc,
> +		    sk_read_actor_t recv_actor);
>  s64 vsock_stream_has_data(struct vsock_sock *vsk);
>  s64 vsock_stream_has_space(struct vsock_sock *vsk);
>  void vsock_pending_work(struct work_struct *work);
> @@ -125,6 +128,8 @@ struct vsock_transport {
>  	u64 (*stream_rcvhiwat)(struct vsock_sock *);
>  	bool (*stream_is_active)(struct vsock_sock *);
>  	bool (*stream_allow)(u32 cid, u32 port);
> +	int (*stream_read_sock)(struct vsock_sock *, read_descriptor_t *desc,
> +				sk_read_actor_t recv_actor);
>  
>  	/* Notification. */
>  	int (*notify_poll_in)(struct vsock_sock *, size_t, bool *);
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index 3acef3c..5128e41 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -734,6 +734,7 @@ static struct virtio_transport vhost_transport = {
>  		.stream_rcvhiwat          = virtio_transport_stream_rcvhiwat,
>  		.stream_is_active         = virtio_transport_stream_is_active,
>  		.stream_allow             = virtio_transport_stream_allow,
> +		.stream_read_sock         = virtio_transport_stream_read_sock,
>  
>  		.notify_poll_in           = virtio_transport_notify_poll_in,
>  		.notify_poll_out          = virtio_transport_notify_poll_out,
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> index dfc8c51e..2ae94b5a 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -706,6 +706,22 @@ static void vsock_sk_destruct(struct sock *sk)
>  	put_cred(vsk->owner);
>  }
>  
> +/* See documentation for tcp_read_sock() */
> +int vsock_read_sock(struct sock *sk, read_descriptor_t *desc,
> +		    sk_read_actor_t recv_actor)
> +{
> +	struct vsock_sock *vsp = vsock_sk(sk);
> +
> +	if (sk->sk_type != SOCK_STREAM)
> +		return -EOPNOTSUPP;
> +
> +	if (sk->sk_state != SS_CONNECTED && sk->sk_state != SS_DISCONNECTING)
> +		return -ENOTCONN;
> +
> +	return transport->stream_read_sock(vsp, desc, recv_actor);
> +}
> +EXPORT_SYMBOL(vsock_read_sock);
> +
>  static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
>  {
>  	int err;
> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> index 403d86e80..8264307 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -520,6 +520,7 @@ static struct virtio_transport virtio_transport = {
>  		.stream_rcvhiwat          = virtio_transport_stream_rcvhiwat,
>  		.stream_is_active         = virtio_transport_stream_is_active,
>  		.stream_allow             = virtio_transport_stream_allow,
> +		.stream_read_sock         = virtio_transport_stream_read_sock,
>  
>  		.notify_poll_in           = virtio_transport_notify_poll_in,
>  		.notify_poll_out          = virtio_transport_notify_poll_out,
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 18e2479..510d9e1 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -316,6 +316,72 @@ virtio_transport_stream_dequeue(struct vsock_sock *vsk,
>  EXPORT_SYMBOL_GPL(virtio_transport_stream_dequeue);
>  
>  int
> +virtio_transport_stream_read_sock(struct vsock_sock *vsk,
> +				  read_descriptor_t *desc,
> +				  sk_read_actor_t recv_actor)
> +{
> +	struct virtio_vsock_sock *vvs;
> +	int ret = 0;
> +
> +	vvs = vsk->trans;
> +
> +	spin_lock_bh(&vvs->rx_lock);
> +	while (!list_empty(&vvs->rx_queue)) {
> +		struct virtio_vsock_pkt *pkt;
> +		struct sk_buff *skb;
> +		size_t len;
> +		int used;
> +
> +		pkt = list_first_entry(&vvs->rx_queue,
> +				       struct virtio_vsock_pkt, list);
> +
> +		/* sk_lock is held by caller so no one else can dequeue.
> +		 * Unlock rx_lock so recv_actor() can sleep.
> +		 */
> +		spin_unlock_bh(&vvs->rx_lock);
> +
> +		len = pkt->len - pkt->off;
> +		skb = alloc_skb(len, GFP_KERNEL);
> +		if (!skb) {
> +			ret = -ENOMEM;
> +			goto out_nolock;
> +		}
> +
> +		memcpy(skb_put(skb, len),
> +		       pkt->buf + pkt->off,
> +		       len);
> +
> +		used = recv_actor(desc, skb, 0, len);
> +
> +		kfree_skb(skb);
> +
> +		spin_lock_bh(&vvs->rx_lock);
> +
> +		if (used > 0) {
> +			ret += used;
> +			pkt->off += used;
> +			if (pkt->off == pkt->len) {
> +				virtio_transport_dec_rx_pkt(vvs, pkt);
> +				list_del(&pkt->list);
> +				virtio_transport_free_pkt(pkt);
> +			}
> +		}
> +
> +		if (used <= 0 || !desc->count)
> +			break;
> +	}
> +	spin_unlock_bh(&vvs->rx_lock);
> +
> +out_nolock:
> +	if (ret > 0)
> +		virtio_transport_send_credit_update(vsk,
> +				VIRTIO_VSOCK_TYPE_STREAM, NULL);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(virtio_transport_stream_read_sock);
> +
> +int
>  virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
>  			       struct msghdr *msg,
>  			       size_t len, int flags)
> diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
> index 10ae782..dfbdb4c 100644
> --- a/net/vmw_vsock/vmci_transport.c
> +++ b/net/vmw_vsock/vmci_transport.c
> @@ -646,6 +646,13 @@ static bool vmci_transport_stream_allow(u32 cid, u32 port)
>  	return true;
>  }
>  
> +static int vmci_transport_stream_read_sock(struct vsock_sock *vsk,
> +					   read_descriptor_t *desc,
> +					   sk_read_actor_t recv_actor)
> +{
> +	return -EOPNOTSUPP; /* not yet implemented */
> +}
> +
>  /* This is invoked as part of a tasklet that's scheduled when the VMCI
>   * interrupt fires.  This is run in bottom-half context but it defers most of
>   * its work to the packet handling work queue.
> @@ -2061,6 +2068,7 @@ static const struct vsock_transport vmci_transport = {
>  	.stream_rcvhiwat = vmci_transport_stream_rcvhiwat,
>  	.stream_is_active = vmci_transport_stream_is_active,
>  	.stream_allow = vmci_transport_stream_allow,
> +	.stream_read_sock = vmci_transport_stream_read_sock,
>  	.notify_poll_in = vmci_transport_notify_poll_in,
>  	.notify_poll_out = vmci_transport_notify_poll_out,
>  	.notify_recv_init = vmci_transport_notify_recv_init,

Do we need this for the hyperv driver as well?

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 07/14] SUNRPC: drop unnecessary svc_bc_tcp_create() helper
  2017-06-30 13:23 ` [PATCH v3 07/14] SUNRPC: drop unnecessary svc_bc_tcp_create() helper Stefan Hajnoczi
@ 2017-10-31 13:55   ` Jeff Layton
  0 siblings, 0 replies; 35+ messages in thread
From: Jeff Layton @ 2017-10-31 13:55 UTC (permalink / raw)
  To: Stefan Hajnoczi, linux-nfs
  Cc: Abbas Naderi, Anna Schumaker, Trond Myklebust, J. Bruce Fields,
	Chuck Lever

On Fri, 2017-06-30 at 14:23 +0100, Stefan Hajnoczi wrote:
> svc_bc_tcp_create() is a helper function that simply calls
> svc_bc_create_socket() with an added IPPROTO_TCP argument.
> 
> svc_bc_create_socket() then checks that the protocol argument is indeed
> IPPROTO_TCP.
> 
> This isn't necessary since svc_bc_tcp_create() is the only
> svc_bc_create_socket() caller.  The next patch adds a second caller for
> AF_VSOCK where IPPROTO_TCP will not be used.
> 
> Scrap this scheme and just call svc_bc_create_socket() directly.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  net/sunrpc/svcsock.c | 21 +++------------------
>  1 file changed, 3 insertions(+), 18 deletions(-)
> 
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 2b720fa..394e7a2 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -71,7 +71,7 @@ static struct svc_xprt *svc_create_socket(struct svc_serv *, int,
>  					  struct net *, struct sockaddr *,
>  					  int, int);
>  #if defined(CONFIG_SUNRPC_BACKCHANNEL)
> -static struct svc_xprt *svc_bc_create_socket(struct svc_serv *, int,
> +static struct svc_xprt *svc_bc_create_socket(struct svc_serv *,
>  					     struct net *, struct sockaddr *,
>  					     int, int);
>  static void svc_bc_sock_free(struct svc_xprt *xprt);
> @@ -1212,25 +1212,17 @@ static struct svc_xprt *svc_tcp_create(struct svc_serv *serv,
>  }
>  
>  #if defined(CONFIG_SUNRPC_BACKCHANNEL)
> -static struct svc_xprt *svc_bc_create_socket(struct svc_serv *, int,
> +static struct svc_xprt *svc_bc_create_socket(struct svc_serv *,
>  					     struct net *, struct sockaddr *,
>  					     int, int);
>  static void svc_bc_sock_free(struct svc_xprt *xprt);
>  
> -static struct svc_xprt *svc_bc_tcp_create(struct svc_serv *serv,
> -				       struct net *net,
> -				       struct sockaddr *sa, int salen,
> -				       int flags)
> -{
> -	return svc_bc_create_socket(serv, IPPROTO_TCP, net, sa, salen, flags);
> -}
> -
>  static void svc_bc_tcp_sock_detach(struct svc_xprt *xprt)
>  {
>  }
>  
>  static struct svc_xprt_ops svc_tcp_bc_ops = {
> -	.xpo_create = svc_bc_tcp_create,
> +	.xpo_create = svc_bc_create_socket,
>  	.xpo_detach = svc_bc_tcp_sock_detach,
>  	.xpo_free = svc_bc_sock_free,
>  	.xpo_prep_reply_hdr = svc_tcp_prep_reply_hdr,
> @@ -1615,7 +1607,6 @@ static void svc_sock_free(struct svc_xprt *xprt)
>   * Create a back channel svc_xprt which shares the fore channel socket.
>   */
>  static struct svc_xprt *svc_bc_create_socket(struct svc_serv *serv,
> -					     int protocol,
>  					     struct net *net,
>  					     struct sockaddr *sin, int len,
>  					     int flags)
> @@ -1623,12 +1614,6 @@ static struct svc_xprt *svc_bc_create_socket(struct svc_serv *serv,
>  	struct svc_sock *svsk;
>  	struct svc_xprt *xprt;
>  
> -	if (protocol != IPPROTO_TCP) {
> -		printk(KERN_WARNING "svc: only TCP sockets"
> -			" supported on shared back channel\n");
> -		return ERR_PTR(-EINVAL);
> -	}
> -
>  	svsk = kzalloc(sizeof(*svsk), GFP_KERNEL);
>  	if (!svsk)
>  		return ERR_PTR(-ENOMEM);

This one can go in ahead of the rest, I think. Trond or Bruce, do either
of you want to pick this up for the next merge window?

Reviewed-by: Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 08/14] SUNRPC: add AF_VSOCK support to svc_xprt.c
  2017-06-30 13:23 ` [PATCH v3 08/14] SUNRPC: add AF_VSOCK support to svc_xprt.c Stefan Hajnoczi
@ 2017-10-31 14:10   ` Jeff Layton
  2017-11-07 13:31     ` Stefan Hajnoczi
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff Layton @ 2017-10-31 14:10 UTC (permalink / raw)
  To: Stefan Hajnoczi, linux-nfs
  Cc: Abbas Naderi, Anna Schumaker, Trond Myklebust, J. Bruce Fields,
	Chuck Lever

On Fri, 2017-06-30 at 14:23 +0100, Stefan Hajnoczi wrote:
> Allow creation of AF_VSOCK service xprts.  This is needed for the
> "vsock-bc" backchannel.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/linux/sunrpc/svc_xprt.h | 12 ++++++++++++
>  net/sunrpc/svc_xprt.c           | 18 ++++++++++++++++++
>  2 files changed, 30 insertions(+)
> 
> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
> index ddb7f94..938f9ee 100644
> --- a/include/linux/sunrpc/svc_xprt.h
> +++ b/include/linux/sunrpc/svc_xprt.h
> @@ -8,6 +8,7 @@
>  #define SUNRPC_SVC_XPRT_H
>  
>  #include <linux/sunrpc/svc.h>
> +#include <linux/vm_sockets.h>
>  
>  struct module;
>  
> @@ -156,12 +157,15 @@ static inline unsigned short svc_addr_port(const struct sockaddr *sa)
>  {
>  	const struct sockaddr_in *sin = (const struct sockaddr_in *)sa;
>  	const struct sockaddr_in6 *sin6 = (const struct sockaddr_in6 *)sa;
> +	const struct sockaddr_vm *svm = (const struct sockaddr_vm *)sa;
>  
>  	switch (sa->sa_family) {
>  	case AF_INET:
>  		return ntohs(sin->sin_port);
>  	case AF_INET6:
>  		return ntohs(sin6->sin6_port);
> +	case AF_VSOCK:
> +		return svm->svm_port;
>  	}
>  
>  	return 0;
> @@ -174,6 +178,8 @@ static inline size_t svc_addr_len(const struct sockaddr *sa)
>  		return sizeof(struct sockaddr_in);
>  	case AF_INET6:
>  		return sizeof(struct sockaddr_in6);
> +	case AF_VSOCK:
> +		return sizeof(struct sockaddr_vm);
>  	}
>  	BUG();
>  }
> @@ -193,6 +199,7 @@ static inline char *__svc_print_addr(const struct sockaddr *addr,
>  {
>  	const struct sockaddr_in *sin = (const struct sockaddr_in *)addr;
>  	const struct sockaddr_in6 *sin6 = (const struct sockaddr_in6 *)addr;
> +	const struct sockaddr_vm *svm = (const struct sockaddr_vm *)addr;
>  
>  	switch (addr->sa_family) {
>  	case AF_INET:
> @@ -206,6 +213,11 @@ static inline char *__svc_print_addr(const struct sockaddr *addr,
>  			ntohs(sin6->sin6_port));
>  		break;
>  
> +	case AF_VSOCK:
> +		snprintf(buf, len, "%u, port=%u",
> +			 svm->svm_cid, svm->svm_port);
> +		break;
> +
>  	default:
>  		snprintf(buf, len, "unknown address type: %d", addr->sa_family);
>  		break;
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index 7bfe1fb..2a8ccc1 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -10,6 +10,7 @@
>  #include <linux/kthread.h>
>  #include <linux/slab.h>
>  #include <net/sock.h>
> +#include <net/af_vsock.h>
>  #include <linux/sunrpc/addr.h>
>  #include <linux/sunrpc/stats.h>
>  #include <linux/sunrpc/svc_xprt.h>
> @@ -195,6 +196,13 @@ static struct svc_xprt *__svc_xpo_create(struct svc_xprt_class *xcl,
>  		.sin6_port		= htons(port),
>  	};
>  #endif
> +#ifdef CONFIG_SUNRPC_XPRT_VSOCK
> +	struct sockaddr_vm svm = {
> +		.svm_family		= AF_VSOCK,
> +		.svm_cid		= VMADDR_CID_ANY,
> +		.svm_port		= port,
> +	};
> +#endif
>  	struct sockaddr *sap;
>  	size_t len;
>  
> @@ -209,6 +217,12 @@ static struct svc_xprt *__svc_xpo_create(struct svc_xprt_class *xcl,
>  		len = sizeof(sin6);
>  		break;
>  #endif
> +#ifdef CONFIG_SUNRPC_XPRT_VSOCK
> +	case AF_VSOCK:
> +		sap = (struct sockaddr *)&svm;
> +		len = sizeof(svm);
> +		break;
> +#endif
>  	default:
>  		return ERR_PTR(-EAFNOSUPPORT);
>  	}
> @@ -595,6 +609,10 @@ int svc_port_is_privileged(struct sockaddr *sin)
>  	case AF_INET6:
>  		return ntohs(((struct sockaddr_in6 *)sin)->sin6_port)
>  			< PROT_SOCK;
> +	case AF_VSOCK:
> +		return ((struct sockaddr_vm *)sin)->svm_port <=
> +			LAST_RESERVED_PORT;
> +
>  	default:
>  		return 0;
>  	}

Does vsock even have the concept of a privileged port? I would imagine
that root in a guest VM would carry no particular significance from an
export security standpoint

Since you're defining a new transport here, it might be nice write the
RFCs to avoid that distinction, if possible.

Note that RDMA just has svc_port_is_privileged always return 1.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 08/14] SUNRPC: add AF_VSOCK support to svc_xprt.c
  2017-10-31 14:10   ` Jeff Layton
@ 2017-11-07 13:31     ` Stefan Hajnoczi
  2017-11-07 14:01       ` Jeff Layton
  0 siblings, 1 reply; 35+ messages in thread
From: Stefan Hajnoczi @ 2017-11-07 13:31 UTC (permalink / raw)
  To: Jeff Layton
  Cc: linux-nfs, Abbas Naderi, Anna Schumaker, Trond Myklebust,
	J. Bruce Fields, Chuck Lever

[-- Attachment #1: Type: text/plain, Size: 912 bytes --]

On Tue, Oct 31, 2017 at 10:10:38AM -0400, Jeff Layton wrote:
> On Fri, 2017-06-30 at 14:23 +0100, Stefan Hajnoczi wrote:
> > @@ -595,6 +609,10 @@ int svc_port_is_privileged(struct sockaddr *sin)
> >  	case AF_INET6:
> >  		return ntohs(((struct sockaddr_in6 *)sin)->sin6_port)
> >  			< PROT_SOCK;
> > +	case AF_VSOCK:
> > +		return ((struct sockaddr_vm *)sin)->svm_port <=
> > +			LAST_RESERVED_PORT;
> > +
> >  	default:
> >  		return 0;
> >  	}
> 
> Does vsock even have the concept of a privileged port? I would imagine
> that root in a guest VM would carry no particular significance from an
> export security standpoint
> 
> Since you're defining a new transport here, it might be nice write the
> RFCs to avoid that distinction, if possible.
> 
> Note that RDMA just has svc_port_is_privileged always return 1.

AF_VSOCK has the same 0-1023 privileged port range as TCP.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 05/14] VSOCK: add tcp_read_sock()-like vsock_read_sock() function
  2017-10-31 13:35   ` Jeff Layton
@ 2017-11-07 13:32     ` Stefan Hajnoczi
  0 siblings, 0 replies; 35+ messages in thread
From: Stefan Hajnoczi @ 2017-11-07 13:32 UTC (permalink / raw)
  To: Jeff Layton
  Cc: linux-nfs, Abbas Naderi, Anna Schumaker, Trond Myklebust,
	J. Bruce Fields, Chuck Lever

[-- Attachment #1: Type: text/plain, Size: 744 bytes --]

On Tue, Oct 31, 2017 at 09:35:12AM -0400, Jeff Layton wrote:
> On Fri, 2017-06-30 at 14:23 +0100, Stefan Hajnoczi wrote:
> > @@ -2061,6 +2068,7 @@ static const struct vsock_transport vmci_transport = {
> >  	.stream_rcvhiwat = vmci_transport_stream_rcvhiwat,
> >  	.stream_is_active = vmci_transport_stream_is_active,
> >  	.stream_allow = vmci_transport_stream_allow,
> > +	.stream_read_sock = vmci_transport_stream_read_sock,
> >  	.notify_poll_in = vmci_transport_notify_poll_in,
> >  	.notify_poll_out = vmci_transport_notify_poll_out,
> >  	.notify_recv_init = vmci_transport_notify_recv_init,
> 
> Do we need this for the hyperv driver as well?

Yes, thank you for pointing it out.  Will add in the next revision.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 06/14] SUNRPC: add AF_VSOCK support to xprtsock.c
  2017-06-30 13:23 ` [PATCH v3 06/14] SUNRPC: add AF_VSOCK support to xprtsock.c Stefan Hajnoczi
@ 2017-11-07 13:46   ` Jeff Layton
  2017-11-14 16:45     ` Stefan Hajnoczi
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff Layton @ 2017-11-07 13:46 UTC (permalink / raw)
  To: Stefan Hajnoczi, linux-nfs
  Cc: Abbas Naderi, Anna Schumaker, Trond Myklebust, J. Bruce Fields,
	Chuck Lever

On Fri, 2017-06-30 at 14:23 +0100, Stefan Hajnoczi wrote:
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/linux/sunrpc/xprt.h |   1 +
>  net/sunrpc/xprtsock.c       | 385 +++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 381 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
> index eab1c74..c038d8a 100644
> --- a/include/linux/sunrpc/xprt.h
> +++ b/include/linux/sunrpc/xprt.h
> @@ -170,6 +170,7 @@ enum xprt_transports {
>  	XPRT_TRANSPORT_RDMA	= 256,
>  	XPRT_TRANSPORT_BC_RDMA	= XPRT_TRANSPORT_RDMA | XPRT_TRANSPORT_BC,
>  	XPRT_TRANSPORT_LOCAL	= 257,
> +	XPRT_TRANSPORT_VSOCK	= 258,
>  };
>  
>  struct rpc_xprt {
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index fd0c8b1..cc343b91 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -46,6 +46,7 @@
>  #include <net/checksum.h>
>  #include <net/udp.h>
>  #include <net/tcp.h>
> +#include <net/af_vsock.h>
>  
>  #include <trace/events/sunrpc.h>
>  
> @@ -271,6 +272,13 @@ static void xs_format_common_peer_addresses(struct rpc_xprt *xprt)
>  		sin6 = xs_addr_in6(xprt);
>  		snprintf(buf, sizeof(buf), "%pi6", &sin6->sin6_addr);
>  		break;
> +	case AF_VSOCK:
> +		(void)rpc_ntop(sap, buf, sizeof(buf));
> +		xprt->address_strings[RPC_DISPLAY_ADDR] =
> +						kstrdup(buf, GFP_KERNEL);
> +		snprintf(buf, sizeof(buf), "%08x",
> +			 ((struct sockaddr_vm *)sap)->svm_cid);
> +		break;
>  	default:
>  		BUG();
>  	}
> @@ -1881,21 +1889,30 @@ static int xs_bind(struct sock_xprt *transport, struct socket *sock)
>  			nloop++;
>  	} while (err == -EADDRINUSE && nloop != 2);
>  
> -	if (myaddr.ss_family == AF_INET)
> +	switch (myaddr.ss_family) {
> +	case AF_INET:
>  		dprintk("RPC:       %s %pI4:%u: %s (%d)\n", __func__,
>  				&((struct sockaddr_in *)&myaddr)->sin_addr,
>  				port, err ? "failed" : "ok", err);
> -	else
> +		break;
> +	case AF_INET6:
>  		dprintk("RPC:       %s %pI6:%u: %s (%d)\n", __func__,
>  				&((struct sockaddr_in6 *)&myaddr)->sin6_addr,
>  				port, err ? "failed" : "ok", err);
> +		break;
> +	case AF_VSOCK:
> +		dprintk("RPC:       %s %u:%u: %s (%d)\n", __func__,
> +				((struct sockaddr_vm *)&myaddr)->svm_cid,
> +				port, err ? "failed" : "ok", err);
> +		break;
> +	}
>  	return err;
>  }
>  
>  /*
> - * We don't support autobind on AF_LOCAL sockets
> + * We don't support autobind on AF_LOCAL and AF_VSOCK sockets
>   */
> -static void xs_local_rpcbind(struct rpc_task *task)
> +static void xs_dummy_rpcbind(struct rpc_task *task)
>  {
>  	xprt_set_bound(task->tk_xprt);
>  }
> @@ -1932,6 +1949,14 @@ static inline void xs_reclassify_socket6(struct socket *sock)
>  		&xs_slock_key[1], "sk_lock-AF_INET6-RPC", &xs_key[1]);
>  }
>  
> +static inline void xs_reclassify_socket_vsock(struct socket *sock)
> +{
> +	struct sock *sk = sock->sk;
> +
> +	sock_lock_init_class_and_name(sk, "slock-AF_VSOCK-RPC",
> +		&xs_slock_key[1], "sk_lock-AF_VSOCK-RPC", &xs_key[1]);
> +}
> +
>  static inline void xs_reclassify_socket(int family, struct socket *sock)
>  {
>  	if (WARN_ON_ONCE(!sock_allow_reclassification(sock->sk)))
> @@ -1947,6 +1972,9 @@ static inline void xs_reclassify_socket(int family, struct socket *sock)
>  	case AF_INET6:
>  		xs_reclassify_socket6(sock);
>  		break;
> +	case AF_VSOCK:
> +		xs_reclassify_socket_vsock(sock);
> +		break;
>  	}
>  }
>  #else
> @@ -2743,7 +2771,7 @@ static struct rpc_xprt_ops xs_local_ops = {
>  	.reserve_xprt		= xprt_reserve_xprt,
>  	.release_xprt		= xs_tcp_release_xprt,
>  	.alloc_slot		= xprt_alloc_slot,
> -	.rpcbind		= xs_local_rpcbind,
> +	.rpcbind		= xs_dummy_rpcbind,
>  	.set_port		= xs_local_set_port,
>  	.connect		= xs_local_connect,
>  	.buf_alloc		= rpc_malloc,
> @@ -2836,6 +2864,10 @@ static int xs_init_anyaddr(const int family, struct sockaddr *sap)
>  		.sin6_family		= AF_INET6,
>  		.sin6_addr		= IN6ADDR_ANY_INIT,
>  	};
> +	static const struct sockaddr_vm svm = {
> +		.svm_family		= AF_VSOCK,
> +		.svm_cid		= VMADDR_CID_ANY,
> +	};
>  
>  	switch (family) {
>  	case AF_LOCAL:
> @@ -2846,6 +2878,9 @@ static int xs_init_anyaddr(const int family, struct sockaddr *sap)
>  	case AF_INET6:
>  		memcpy(sap, &sin6, sizeof(sin6));
>  		break;
> +	case AF_VSOCK:
> +		memcpy(sap, &svm, sizeof(svm));
> +		break;
>  	default:
>  		dprintk("RPC:       %s: Bad address family\n", __func__);
>  		return -EAFNOSUPPORT;
> @@ -3203,6 +3238,330 @@ static struct rpc_xprt *xs_setup_bc_tcp(struct xprt_create *args)
>  	return ret;
>  }
>  
> +#ifdef CONFIG_SUNRPC_XPRT_VSOCK
> +/**
> + * xs_vsock_state_change - callback to handle vsock socket state changes
> + * @sk: socket whose state has changed
> + *
> + */
> +static void xs_vsock_state_change(struct sock *sk)
> +{
> +	struct rpc_xprt *xprt;
> +
> +	read_lock_bh(&sk->sk_callback_lock);
> +	if (!(xprt = xprt_from_sock(sk)))
> +		goto out;
> +	dprintk("RPC:       %s client %p...\n", __func__, xprt);
> +	dprintk("RPC:       state %x conn %d dead %d zapped %d sk_shutdown %d\n",
> +			sk->sk_state, xprt_connected(xprt),
> +			sock_flag(sk, SOCK_DEAD),
> +			sock_flag(sk, SOCK_ZAPPED),
> +			sk->sk_shutdown);
> +
> +	trace_rpc_socket_state_change(xprt, sk->sk_socket);
> +
> +	switch (sk->sk_state) {
> +	case SS_CONNECTING:
> +		/* Do nothing */
> +		break;
> +
> +	case SS_CONNECTED:
> +		spin_lock(&xprt->transport_lock);
> +		if (!xprt_test_and_set_connected(xprt)) {
> +			xs_stream_reset_state(xprt, vsock_read_sock);
> +			xprt->connect_cookie++;
> +
> +			xprt_wake_pending_tasks(xprt, -EAGAIN);
> +		}
> +		spin_unlock(&xprt->transport_lock);
> +		break;
> +
> +	case SS_DISCONNECTING:
> +		/* TODO do we need to distinguish between various shutdown (client-side/server-side)? */
> +		/* The client initiated a shutdown of the socket */
> +		xprt->connect_cookie++;
> +		xprt->reestablish_timeout = 0;
> +		set_bit(XPRT_CLOSING, &xprt->state);
> +		smp_mb__before_atomic();
> +		clear_bit(XPRT_CONNECTED, &xprt->state);
> +		clear_bit(XPRT_CLOSE_WAIT, &xprt->state);
> +		smp_mb__after_atomic();
> +		break;
> +
> +	case SS_UNCONNECTED:
> +		xs_sock_mark_closed(xprt);
> +		break;
> +	}
> +
> + out:
> +	read_unlock_bh(&sk->sk_callback_lock);
> +}
> +
> +/**
> + * xs_vsock_error_report - callback to handle vsock socket state errors
> + * @sk: socket
> + *
> + * Note: we don't call sock_error() since there may be a rpc_task
> + * using the socket, and so we don't want to clear sk->sk_err.
> + */
> +static void xs_vsock_error_report(struct sock *sk)
> +{
> +	struct rpc_xprt *xprt;
> +	int err;
> +
> +	read_lock_bh(&sk->sk_callback_lock);
> +	if (!(xprt = xprt_from_sock(sk)))
> +		goto out;
> +
> +	err = -sk->sk_err;
> +	if (err == 0)
> +		goto out;
> +	/* Is this a reset event? */
> +	if (sk->sk_state == SS_UNCONNECTED)
> +		xs_sock_mark_closed(xprt);
> +	dprintk("RPC:       %s client %p, error=%d...\n",
> +			__func__, xprt, -err);
> +	trace_rpc_socket_error(xprt, sk->sk_socket, err);
> +	xprt_wake_pending_tasks(xprt, err);
> + out:
> +	read_unlock_bh(&sk->sk_callback_lock);
> +}

Hmm ok...so we have this to avoid some TCP specific stuff in
xs_error_report, I guess?

I wonder if AF_LOCAL transport should be using the function above,
rather than xs_error_report? If so, maybe we should rename:

    xs_error_report -> xs_tcp_error_report
    xs_vsock_error_report -> xs_stream_error_report

Might be good to do that cleanup first as a preparatory patch.

> +
> +/**
> + * xs_vsock_finish_connecting - initialize and connect socket
> + */
> +static int xs_vsock_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
> +{
> +	struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
> +	int ret = -ENOTCONN;
> +
> +	if (!transport->inet) {
> +		struct sock *sk = sock->sk;
> +
> +		write_lock_bh(&sk->sk_callback_lock);
> +
> +		xs_save_old_callbacks(transport, sk);
> +
> +		sk->sk_user_data = xprt;
> +		sk->sk_data_ready = xs_data_ready;
> +		sk->sk_state_change = xs_vsock_state_change;
> +		sk->sk_write_space = xs_tcp_write_space;

Might should rename xs_tcp_write_space to xs_stream_write_space?

> +		sk->sk_error_report = xs_vsock_error_report;
> +		sk->sk_allocation = GFP_ATOMIC;

Why GFP_ATOMIC here? The other finish routines use GFP_NOIO.

> +
> +		xprt_clear_connected(xprt);
> +
> +		/* Reset to new socket */
> +		transport->sock = sock;
> +		transport->inet = sk;
> +
> +		write_unlock_bh(&sk->sk_callback_lock);
> +	}
> +
> +	if (!xprt_bound(xprt))
> +		goto out;
> +
> +	xs_set_memalloc(xprt);
> +
> +	/* Tell the socket layer to start connecting... */
> +	xprt->stat.connect_count++;
> +	xprt->stat.connect_start = jiffies;
> +	ret = kernel_connect(sock, xs_addr(xprt), xprt->addrlen, O_NONBLOCK);
> +	switch (ret) {
> +	case 0:
> +		xs_set_srcport(transport, sock);
> +	case -EINPROGRESS:
> +		/* SYN_SENT! */
> +		if (xprt->reestablish_timeout < XS_TCP_INIT_REEST_TO)
> +			xprt->reestablish_timeout = XS_TCP_INIT_REEST_TO;
> +	}
> +out:
> +	return ret;
> +}
> +
> +/**
> + * xs_vsock_setup_socket - create a vsock socket and connect to a remote endpoint
> + *
> + * Invoked by a work queue tasklet.
> + */
> +static void xs_vsock_setup_socket(struct work_struct *work)
> +{
> +	struct sock_xprt *transport =
> +		container_of(work, struct sock_xprt, connect_worker.work);
> +	struct socket *sock = transport->sock;
> +	struct rpc_xprt *xprt = &transport->xprt;
> +	int status = -EIO;
> +
> +	if (!sock) {
> +		sock = xs_create_sock(xprt, transport,
> +				xs_addr(xprt)->sa_family, SOCK_STREAM,
> +				0, true);
> +		if (IS_ERR(sock)) {
> +			status = PTR_ERR(sock);
> +			goto out;
> +		}
> +	}
> +
> +	dprintk("RPC:       worker connecting xprt %p via %s to "
> +				"%s (port %s)\n", xprt,
> +			xprt->address_strings[RPC_DISPLAY_PROTO],
> +			xprt->address_strings[RPC_DISPLAY_ADDR],
> +			xprt->address_strings[RPC_DISPLAY_PORT]);
> +
> +	status = xs_vsock_finish_connecting(xprt, sock);
> +	trace_rpc_socket_connect(xprt, sock, status);
> +	dprintk("RPC:       %p connect status %d connected %d sock state %d\n",
> +			xprt, -status, xprt_connected(xprt),
> +			sock->sk->sk_state);
> +	switch (status) {
> +	default:
> +		printk("%s: connect returned unhandled error %d\n",
> +			__func__, status);
> +	case -EADDRNOTAVAIL:
> +		/* We're probably in TIME_WAIT. Get rid of existing socket,
> +		 * and retry
> +		 */
> +		xs_tcp_force_close(xprt);
> +		break;
> +	case 0:
> +	case -EINPROGRESS:
> +	case -EALREADY:
> +		xprt_unlock_connect(xprt, transport);
> +		xprt_clear_connecting(xprt);
> +		return;
> +	case -EINVAL:
> +		/* Happens, for instance, if the user specified a link
> +		 * local IPv6 address without a scope-id.
> +		 */
> +	case -ECONNREFUSED:
> +	case -ECONNRESET:
> +	case -ENETUNREACH:
> +	case -EADDRINUSE:
> +	case -ENOBUFS:
> +		/* retry with existing socket, after a delay */
> +		xs_tcp_force_close(xprt);
> +		goto out;
> +	}
> +	status = -EAGAIN;
> +out:
> +	xprt_unlock_connect(xprt, transport);
> +	xprt_clear_connecting(xprt);
> +	xprt_wake_pending_tasks(xprt, status);
> +}
> +
> +/**
> + * xs_vsock_print_stats - display vsock socket-specifc stats
> + * @xprt: rpc_xprt struct containing statistics
> + * @seq: output file
> + *
> + */
> +static void xs_vsock_print_stats(struct rpc_xprt *xprt, struct seq_file *seq)
> +{
> +	struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
> +	long idle_time = 0;
> +
> +	if (xprt_connected(xprt))
> +		idle_time = (long)(jiffies - xprt->last_used) / HZ;
> +
> +	seq_printf(seq, "\txprt:\tvsock %u %lu %lu %lu %ld %lu %lu %lu "
> +			"%llu %llu %lu %llu %llu\n",
> +			transport->srcport,
> +			xprt->stat.bind_count,
> +			xprt->stat.connect_count,
> +			xprt->stat.connect_time,
> +			idle_time,
> +			xprt->stat.sends,
> +			xprt->stat.recvs,
> +			xprt->stat.bad_xids,
> +			xprt->stat.req_u,
> +			xprt->stat.bklog_u,
> +			xprt->stat.max_slots,
> +			xprt->stat.sending_u,
> +			xprt->stat.pending_u);
> +}
> +
> +static struct rpc_xprt_ops xs_vsock_ops = {
> +	.reserve_xprt		= xprt_reserve_xprt,
> +	.release_xprt		= xs_tcp_release_xprt,
> +	.alloc_slot		= xprt_lock_and_alloc_slot,
> +	.rpcbind		= xs_dummy_rpcbind,
> +	.set_port		= xs_set_port,
> +	.connect		= xs_connect,
> +	.buf_alloc		= rpc_malloc,
> +	.buf_free		= rpc_free,
> +	.send_request		= xs_tcp_send_request,
> +	.set_retrans_timeout	= xprt_set_retrans_timeout_def,
> +	.close			= xs_tcp_shutdown,
> +	.destroy		= xs_destroy,
> +	.print_stats		= xs_vsock_print_stats,
> +};
> +
> +static const struct rpc_timeout xs_vsock_default_timeout = {
> +	.to_initval = 60 * HZ,
> +	.to_maxval = 60 * HZ,
> +	.to_retries = 2,
> +};
> +
> +/**
> + * xs_setup_vsock - Set up transport to use a vsock socket
> + * @args: rpc transport creation arguments
> + *
> + */
> +static struct rpc_xprt *xs_setup_vsock(struct xprt_create *args)
> +{
> +	struct sockaddr_vm *addr = (struct sockaddr_vm *)args->dstaddr;
> +	struct sock_xprt *transport;
> +	struct rpc_xprt *xprt;
> +	struct rpc_xprt *ret;
> +
> +	xprt = xs_setup_xprt(args, xprt_tcp_slot_table_entries,
> +			     xprt_max_tcp_slot_table_entries);
> +	if (IS_ERR(xprt))
> +		return xprt;
> +	transport = container_of(xprt, struct sock_xprt, xprt);
> +
> +	xprt->prot = 0;
> +	xprt->tsh_size = sizeof(rpc_fraghdr) / sizeof(u32);
> +	xprt->max_payload = RPC_MAX_FRAGMENT_SIZE;
> +
> +	xprt->bind_timeout = XS_BIND_TO;
> +	xprt->reestablish_timeout = XS_TCP_INIT_REEST_TO;
> +	xprt->idle_timeout = XS_IDLE_DISC_TO;
> +
> +	xprt->ops = &xs_vsock_ops;
> +	xprt->timeout = &xs_vsock_default_timeout;
> +
> +	INIT_WORK(&transport->recv_worker, xs_stream_data_receive_workfn);
> +	INIT_DELAYED_WORK(&transport->connect_worker, xs_vsock_setup_socket);
> +
> +	switch (addr->svm_family) {
> +	case AF_VSOCK:
> +		if (addr->svm_port == 0) {
> +			dprintk("RPC:       autobind not supported with AF_VSOCK\n");
> +			ret = ERR_PTR(-EINVAL);
> +			goto out_err;
> +		}
> +		xprt_set_bound(xprt);
> +		xs_format_peer_addresses(xprt, "vsock", "vsock" /* TODO register official netid? */);
> +		break;
> +	default:
> +		ret = ERR_PTR(-EAFNOSUPPORT);
> +		goto out_err;
> +	}
> +
> +	dprintk("RPC:       set up xprt to %s (port %s) via AF_VSOCK\n",
> +		xprt->address_strings[RPC_DISPLAY_ADDR],
> +		xprt->address_strings[RPC_DISPLAY_PORT]);
> +
> +	if (try_module_get(THIS_MODULE))
> +		return xprt;
> +	ret = ERR_PTR(-EINVAL);
> +out_err:
> +	xs_xprt_free(xprt);
> +	return ret;
> +}
> +#endif
> +
>  static struct xprt_class	xs_local_transport = {
>  	.list		= LIST_HEAD_INIT(xs_local_transport.list),
>  	.name		= "named UNIX socket",
> @@ -3235,6 +3594,16 @@ static struct xprt_class	xs_bc_tcp_transport = {
>  	.setup		= xs_setup_bc_tcp,
>  };
>  
> +#ifdef CONFIG_SUNRPC_XPRT_VSOCK
> +static struct xprt_class	xs_vsock_transport = {
> +	.list		= LIST_HEAD_INIT(xs_vsock_transport.list),
> +	.name		= "vsock",
> +	.owner		= THIS_MODULE,
> +	.ident		= XPRT_TRANSPORT_VSOCK,
> +	.setup		= xs_setup_vsock,
> +};
> +#endif
> +
>  /**
>   * init_socket_xprt - set up xprtsock's sysctls, register with RPC client
>   *
> @@ -3250,6 +3619,9 @@ int init_socket_xprt(void)
>  	xprt_register_transport(&xs_udp_transport);
>  	xprt_register_transport(&xs_tcp_transport);
>  	xprt_register_transport(&xs_bc_tcp_transport);
> +#ifdef CONFIG_SUNRPC_XPRT_VSOCK
> +	xprt_register_transport(&xs_vsock_transport);
> +#endif
>  
>  	return 0;
>  }
> @@ -3271,6 +3643,9 @@ void cleanup_socket_xprt(void)
>  	xprt_unregister_transport(&xs_udp_transport);
>  	xprt_unregister_transport(&xs_tcp_transport);
>  	xprt_unregister_transport(&xs_bc_tcp_transport);
> +#ifdef CONFIG_SUNRPC_XPRT_VSOCK
> +	xprt_unregister_transport(&xs_vsock_transport);
> +#endif
>  }
>  
>  static int param_set_uint_minmax(const char *val,

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 08/14] SUNRPC: add AF_VSOCK support to svc_xprt.c
  2017-11-07 13:31     ` Stefan Hajnoczi
@ 2017-11-07 14:01       ` Jeff Layton
  2017-11-16 15:25         ` Stefan Hajnoczi
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff Layton @ 2017-11-07 14:01 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: linux-nfs, Abbas Naderi, Anna Schumaker, Trond Myklebust,
	J. Bruce Fields, Chuck Lever

On Tue, 2017-11-07 at 13:31 +0000, Stefan Hajnoczi wrote:
> On Tue, Oct 31, 2017 at 10:10:38AM -0400, Jeff Layton wrote:
> > On Fri, 2017-06-30 at 14:23 +0100, Stefan Hajnoczi wrote:
> > > @@ -595,6 +609,10 @@ int svc_port_is_privileged(struct sockaddr *sin)
> > >  	case AF_INET6:
> > >  		return ntohs(((struct sockaddr_in6 *)sin)->sin6_port)
> > >  			< PROT_SOCK;
> > > +	case AF_VSOCK:
> > > +		return ((struct sockaddr_vm *)sin)->svm_port <=
> > > +			LAST_RESERVED_PORT;
> > > +
> > >  	default:
> > >  		return 0;
> > >  	}
> > 
> > Does vsock even have the concept of a privileged port? I would imagine
> > that root in a guest VM would carry no particular significance from an
> > export security standpoint
> > 
> > Since you're defining a new transport here, it might be nice write the
> > RFCs to avoid that distinction, if possible.
> > 
> > Note that RDMA just has svc_port_is_privileged always return 1.
> 
> AF_VSOCK has the same 0-1023 privileged port range as TCP.
> 

But why? And, given that you have 32-bits for a port with AF_VSOCK vs
the 16 bits on an AF_INET/AF_INET6, why is the range so pitifully small?

Reserved ports are a bit of a dinosaur holdover from when being root on
a machine on the Internet meant something. With v4.1 it's much less of
an issue, but in the "olden days", reserved port exhaustion could be a
real problem.

Mandating low ports can also be a problem in other way. Some well known
services use ports in the ephemeral range, and if your service starts
late and someone else has taken the port for an ephemeral one, you're
out of luck.

I think we have to ask ourselves:

Should the ability to open a low port inside of a VM carry any
significance at all to an RPC server? I'd suggest not, and I think it'd
be good to word the RFC to make that explicitly clear.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 13/14] SUNRPC: vsock svcsock support
  2017-06-30 13:23 ` [PATCH v3 13/14] SUNRPC: vsock svcsock support Stefan Hajnoczi
@ 2017-11-07 14:12   ` Jeff Layton
  2017-11-14 14:20     ` Stefan Hajnoczi
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff Layton @ 2017-11-07 14:12 UTC (permalink / raw)
  To: Stefan Hajnoczi, linux-nfs
  Cc: Abbas Naderi, Anna Schumaker, Trond Myklebust, J. Bruce Fields,
	Chuck Lever

On Fri, 2017-06-30 at 14:23 +0100, Stefan Hajnoczi wrote:
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  net/sunrpc/svcsock.c | 214 +++++++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 190 insertions(+), 24 deletions(-)
> 
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index e67b097..86cce8f 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -42,7 +42,9 @@
>  #include <net/udp.h>
>  #include <net/tcp.h>
>  #include <net/tcp_states.h>
> +#include <net/af_vsock.h>
>  #include <linux/uaccess.h>
> +#include <asm/uaccess.h>
>  #include <asm/ioctls.h>
>  #include <trace/events/skb.h>
>  
> @@ -64,7 +66,7 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *, struct socket *,
>  static int		svc_udp_recvfrom(struct svc_rqst *);
>  static int		svc_udp_sendto(struct svc_rqst *);
>  static void		svc_sock_detach(struct svc_xprt *);
> -static void		svc_tcp_sock_detach(struct svc_xprt *);
> +static void		svc_common_sock_detach(struct svc_xprt *);
>  static void		svc_sock_free(struct svc_xprt *);
>  
>  static struct svc_xprt *svc_create_socket(struct svc_serv *, int,
> @@ -810,9 +812,9 @@ static void svc_tcp_state_change(struct sock *sk)
>  }
>  
>  /*
> - * Accept a TCP connection
> + * Accept an incoming connection
>   */
> -static struct svc_xprt *svc_tcp_accept(struct svc_xprt *xprt)
> +static struct svc_xprt *svc_common_accept(struct svc_xprt *xprt)
>  {
>  	struct svc_sock *svsk = container_of(xprt, struct svc_sock, sk_xprt);
>  	struct sockaddr_storage addr;
> @@ -824,7 +826,7 @@ static struct svc_xprt *svc_tcp_accept(struct svc_xprt *xprt)
>  	int		err, slen;
>  	RPC_IFDEBUG(char buf[RPC_MAX_ADDRBUFLEN]);
>  
> -	dprintk("svc: tcp_accept %p sock %p\n", svsk, sock);
> +	dprintk("svc: %s %p sock %p\n", __func__, svsk, sock);
>  	if (!sock)
>  		return NULL;
>  
> @@ -877,7 +879,7 @@ static struct svc_xprt *svc_tcp_accept(struct svc_xprt *xprt)
>  	svc_xprt_set_remote(&newsvsk->sk_xprt, sin, slen);
>  	err = kernel_getsockname(newsock, sin, &slen);
>  	if (unlikely(err < 0)) {
> -		dprintk("svc_tcp_accept: kernel_getsockname error %d\n", -err);
> +		dprintk("%s: kernel_getsockname error %d\n", __func__, -err);
>  		slen = offsetof(struct sockaddr, sa_data);
>  	}
>  	svc_xprt_set_local(&newsvsk->sk_xprt, sin, slen);
> @@ -1131,7 +1133,7 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp)
>  		rqstp->rq_arg.page_len = rqstp->rq_arg.len - rqstp->rq_arg.head[0].iov_len;
>  
>  	rqstp->rq_xprt_ctxt   = NULL;
> -	rqstp->rq_prot	      = IPPROTO_TCP;
> +	rqstp->rq_prot	      = IPPROTO_TCP; /* TODO vsock should either use 0 or IPPROTO_MAX */
>  	if (test_bit(XPT_LOCAL, &svsk->sk_xprt.xpt_flags))
>  		set_bit(RQ_LOCAL, &rqstp->rq_flags);
>  	else
> @@ -1200,13 +1202,13 @@ static int svc_tcp_sendto(struct svc_rqst *rqstp)
>  }
>  
>  /*
> - * Setup response header. TCP has a 4B record length field.
> + * Setup response header. Byte stream transports have a 4B record length field.
>   */
> -static void svc_tcp_prep_reply_hdr(struct svc_rqst *rqstp)
> +static void svc_stream_prep_reply_hdr(struct svc_rqst *rqstp)
>  {
>  	struct kvec *resv = &rqstp->rq_res.head[0];
>  
> -	/* tcp needs a space for the record length... */
> +	/* space for the record length... */
>  	svc_putnl(resv, 0);
>  }
>  
> @@ -1232,7 +1234,7 @@ static struct svc_xprt_ops svc_tcp_bc_ops = {
>  	.xpo_create = svc_bc_create_socket,
>  	.xpo_detach = svc_bc_tcp_sock_detach,
>  	.xpo_free = svc_bc_sock_free,
> -	.xpo_prep_reply_hdr = svc_tcp_prep_reply_hdr,
> +	.xpo_prep_reply_hdr = svc_stream_prep_reply_hdr,
>  	.xpo_secure_port = svc_sock_secure_port,
>  };
>  
> @@ -1244,6 +1246,145 @@ static struct svc_xprt_class svc_tcp_bc_class = {
>  };
>  
>  #ifdef CONFIG_SUNRPC_XPRT_VSOCK
> +/*
> + * A data_ready event on a listening socket means there's a connection
> + * pending. Do not use state_change as a substitute for it.
> + */
> +static void svc_vsock_listen_data_ready(struct sock *sk)
> +{
> +	struct svc_sock	*svsk = (struct svc_sock *)sk->sk_user_data;
> +	wait_queue_head_t *wq;
> +
> +	dprintk("svc: socket %p AF_VSOCK (listen) state change %d\n",
> +		sk, sk->sk_state);
> +
> +	/*
> +	 * This callback may called twice when a new connection
> +	 * is established as a child socket inherits everything
> +	 * from a parent VSOCK_SS_LISTEN socket.
> +	 * 1) data_ready method of the parent socket will be called
> +	 *    when one of child sockets become SS_CONNECTED.
> +	 * 2) data_ready method of the child socket may be called
> +	 *    when it receives data before the socket is accepted.
> +	 * In case of 2, we should ignore it silently.
> +	 */
> +	if (sk->sk_state == VSOCK_SS_LISTEN) {
> +		if (svsk) {
> +			set_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags);
> +			svc_xprt_enqueue(&svsk->sk_xprt);
> +		} else
> +			printk("svc: socket %p: no user data\n", sk);
> +	}
> +
> +	wq = sk_sleep(sk);
> +	if (wq && waitqueue_active(wq))
> +		wake_up_interruptible_all(wq);
> +}
> +
> +/*
> + * A state change on a connected socket means it's dying or dead.
> + */
> +static void svc_vsock_state_change(struct sock *sk)
> +{
> +	struct svc_sock	*svsk = (struct svc_sock *)sk->sk_user_data;
> +	wait_queue_head_t *wq = sk_sleep(sk);
> +
> +	dprintk("svc: socket %p AF_VSOCK (connected) state change %d (svsk %p)\n",
> +		sk, sk->sk_state, sk->sk_user_data);
> +
> +	if (!svsk)
> +		printk("svc: socket %p: no user data\n", sk);
> +	else {
> +		set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
> +		svc_xprt_enqueue(&svsk->sk_xprt);
> +	}
> +	if (wq && waitqueue_active(wq))
> +		wake_up_interruptible_all(wq);
> +}
> +
> +static void svc_vsock_data_ready(struct sock *sk)
> +{
> +	struct svc_sock *svsk = (struct svc_sock *)sk->sk_user_data;
> +	wait_queue_head_t *wq = sk_sleep(sk);
> +
> +	dprintk("svc: socket %p AF_VSOCK data ready (svsk %p)\n",
> +		sk, sk->sk_user_data);
> +	if (svsk) {
> +		set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
> +		svc_xprt_enqueue(&svsk->sk_xprt);
> +	}
> +	if (wq && waitqueue_active(wq))
> +		wake_up_interruptible(wq);
> +}
> +
> +static int svc_vsock_has_wspace(struct svc_xprt *xprt)
> +{
> +	struct svc_sock *svsk =	container_of(xprt, struct svc_sock, sk_xprt);
> +
> +	if (test_bit(XPT_LISTENER, &xprt->xpt_flags))
> +		return 1;
> +	return !test_bit(SOCK_NOSPACE, &svsk->sk_sock->flags);
> +}
> +
> +static struct svc_xprt *svc_vsock_create(struct svc_serv *serv,
> +					 struct net *net,
> +					 struct sockaddr *sa, int salen,
> +					 int flags)
> +{
> +	return svc_create_socket(serv, 0, net, sa, salen, flags);
> +}
> +
> +static struct svc_xprt_ops svc_vsock_ops = {
> +	.xpo_create = svc_vsock_create,
> +	.xpo_recvfrom = svc_tcp_recvfrom,
> +	.xpo_sendto = svc_tcp_sendto,
> +	.xpo_release_rqst = svc_release_skb,
> +	.xpo_detach = svc_common_sock_detach,
> +	.xpo_free = svc_sock_free,
> +	.xpo_prep_reply_hdr = svc_stream_prep_reply_hdr,
> +	.xpo_has_wspace = svc_vsock_has_wspace,
> +	.xpo_accept = svc_common_accept,
> +	.xpo_secure_port = svc_sock_secure_port,
> +};
> +
> +static struct svc_xprt_class svc_vsock_class = {
> +	.xcl_name = "vsock",
> +	.xcl_owner = THIS_MODULE,
> +	.xcl_ops = &svc_vsock_ops,
> +	.xcl_max_payload = RPCSVC_MAXPAYLOAD,
> +	.xcl_ident = XPRT_TRANSPORT_VSOCK,
> +};
> +
> +static void svc_vsock_init(struct svc_sock *svsk, struct svc_serv *serv)
> +{
> +	struct sock *sk = svsk->sk_sk;
> +
> +	svc_xprt_init(sock_net(svsk->sk_sock->sk), &svc_vsock_class,
> +		      &svsk->sk_xprt, serv);
> +	set_bit(XPT_CACHE_AUTH, &svsk->sk_xprt.xpt_flags);
> +	set_bit(XPT_CONG_CTRL, &svsk->sk_xprt.xpt_flags);
> +	if (sk->sk_state == VSOCK_SS_LISTEN) {
> +		dprintk("setting up AF_VSOCK socket for listening\n");
> +		set_bit(XPT_LISTENER, &svsk->sk_xprt.xpt_flags);
> +		sk->sk_data_ready = svc_vsock_listen_data_ready;
> +		set_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags);
> +	} else {
> +		dprintk("setting up AF_VSOCK socket for reading\n");
> +		sk->sk_state_change = svc_vsock_state_change;
> +		sk->sk_data_ready = svc_vsock_data_ready;
> +		sk->sk_write_space = svc_write_space;
> +
> +		svsk->sk_reclen = 0;
> +		svsk->sk_tcplen = 0;
> +		svsk->sk_datalen = 0;
> +		memset(&svsk->sk_pages[0], 0, sizeof(svsk->sk_pages));
> +
> +		set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
> +		if (sk->sk_state != SS_CONNECTED)
> +			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
> +	}
> +}
> +
>  static void svc_bc_vsock_sock_detach(struct svc_xprt *xprt)
>  {
>  }
> @@ -1252,7 +1393,7 @@ static struct svc_xprt_ops svc_vsock_bc_ops = {
>  	.xpo_create = svc_bc_create_socket,
>  	.xpo_detach = svc_bc_vsock_sock_detach,
>  	.xpo_free = svc_bc_sock_free,
> -	.xpo_prep_reply_hdr = svc_tcp_prep_reply_hdr,
> +	.xpo_prep_reply_hdr = svc_stream_prep_reply_hdr,
>  	.xpo_secure_port = svc_sock_secure_port,
>  };
>  
> @@ -1294,11 +1435,11 @@ static struct svc_xprt_ops svc_tcp_ops = {
>  	.xpo_recvfrom = svc_tcp_recvfrom,
>  	.xpo_sendto = svc_tcp_sendto,
>  	.xpo_release_rqst = svc_release_skb,
> -	.xpo_detach = svc_tcp_sock_detach,
> +	.xpo_detach = svc_common_sock_detach,
>  	.xpo_free = svc_sock_free,
> -	.xpo_prep_reply_hdr = svc_tcp_prep_reply_hdr,
> +	.xpo_prep_reply_hdr = svc_stream_prep_reply_hdr,
>  	.xpo_has_wspace = svc_tcp_has_wspace,
> -	.xpo_accept = svc_tcp_accept,
> +	.xpo_accept = svc_common_accept,
>  	.xpo_secure_port = svc_sock_secure_port,
>  	.xpo_kill_temp_xprt = svc_tcp_kill_temp_xprt,
>  };
> @@ -1315,6 +1456,9 @@ void svc_init_xprt_sock(void)
>  {
>  	svc_reg_xprt_class(&svc_tcp_class);
>  	svc_reg_xprt_class(&svc_udp_class);
> +#ifdef CONFIG_SUNRPC_XPRT_VSOCK
> +	svc_reg_xprt_class(&svc_vsock_class);
> +#endif
>  	svc_init_bc_xprt_sock();
>  }
>  
> @@ -1322,6 +1466,9 @@ void svc_cleanup_xprt_sock(void)
>  {
>  	svc_unreg_xprt_class(&svc_tcp_class);
>  	svc_unreg_xprt_class(&svc_udp_class);
> +#ifdef CONFIG_SUNRPC_XPRT_VSOCK
> +	svc_unreg_xprt_class(&svc_vsock_class);
> +#endif
>  	svc_cleanup_bc_xprt_sock();
>  }
>  
> @@ -1417,6 +1564,10 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
>  	/* Initialize the socket */
>  	if (sock->type == SOCK_DGRAM)
>  		svc_udp_init(svsk, serv);
> +#ifdef CONFIG_SUNRPC_XPRT_VSOCK
> +	else if (inet->sk_family == AF_VSOCK)
> +		svc_vsock_init(svsk, serv);
> +#endif
>  	else
>  		svc_tcp_init(svsk, serv);

The above conditional is a bit of a mess and doesn't really handle the
case where the upper layer might pass it something non-sensical (i.e.
SOCK_DGRAM + AF_VSOCK).

I think this should vet both values and return ERR_PTR(-EINVAL) if it's
not right. Maybe a switch statement on the address family and then check
the sock->type in each? We can afford to code a little defensively here.


>  
> @@ -1468,13 +1619,22 @@ int svc_addsock(struct svc_serv *serv, const int fd, char *name_return,
>  
>  	if (!so)
>  		return err;
> -	err = -EAFNOSUPPORT;
> -	if ((so->sk->sk_family != PF_INET) && (so->sk->sk_family != PF_INET6))
> -		goto out;
> -	err =  -EPROTONOSUPPORT;
> -	if (so->sk->sk_protocol != IPPROTO_TCP &&
> -	    so->sk->sk_protocol != IPPROTO_UDP)
> +	err = -EPROTONOSUPPORT;
> +	switch (so->sk->sk_family) {
> +	case PF_INET:
> +	case PF_INET6:
> +		if (so->sk->sk_protocol != IPPROTO_TCP &&
> +		    so->sk->sk_protocol != IPPROTO_UDP)
> +			goto out;
> +		break;
> +	case PF_VSOCK:
> +		if (so->sk->sk_protocol != 0)
> +			goto out;
> +		break;
> +	default:
> +		err = -EAFNOSUPPORT;
>  		goto out;
> +	}
>  	err = -EISCONN;
>  	if (so->state > SS_UNCONNECTED)
>  		goto out;
> @@ -1521,7 +1681,8 @@ static struct svc_xprt *svc_create_socket(struct svc_serv *serv,
>  			serv->sv_program->pg_name, protocol,
>  			__svc_print_addr(sin, buf, sizeof(buf)));
>  
> -	if (protocol != IPPROTO_UDP && protocol != IPPROTO_TCP) {
> +	if (sin->sa_family != AF_VSOCK &&
> +	    protocol != IPPROTO_UDP && protocol != IPPROTO_TCP) {
>  		printk(KERN_WARNING "svc: only UDP and TCP "
>  				"sockets supported\n");
>  		return ERR_PTR(-EINVAL);
> @@ -1535,6 +1696,11 @@ static struct svc_xprt *svc_create_socket(struct svc_serv *serv,
>  	case AF_INET:
>  		family = PF_INET;
>  		break;
> +#ifdef CONFIG_SUNRPC_XPRT_VSOCK
> +	case AF_VSOCK:
> +		family = PF_VSOCK;
> +		break;
> +#endif
>  	default:
>  		return ERR_PTR(-EINVAL);
>  	}
> @@ -1566,7 +1732,7 @@ static struct svc_xprt *svc_create_socket(struct svc_serv *serv,
>  	if (error < 0)
>  		goto bummer;
>  
> -	if (protocol == IPPROTO_TCP) {
> +	if (type == SOCK_STREAM) {
>  		if ((error = kernel_listen(sock, 64)) < 0)
>  			goto bummer;
>  	}
> @@ -1607,11 +1773,11 @@ static void svc_sock_detach(struct svc_xprt *xprt)
>  /*
>   * Disconnect the socket, and reset the callbacks
>   */
> -static void svc_tcp_sock_detach(struct svc_xprt *xprt)
> +static void svc_common_sock_detach(struct svc_xprt *xprt)
>  {
>  	struct svc_sock *svsk = container_of(xprt, struct svc_sock, sk_xprt);
>  
> -	dprintk("svc: svc_tcp_sock_detach(%p)\n", svsk);
> +	dprintk("svc: %s(%p)\n", __func__, svsk);
>  
>  	svc_sock_detach(xprt);
>  

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 13/14] SUNRPC: vsock svcsock support
  2017-11-07 14:12   ` Jeff Layton
@ 2017-11-14 14:20     ` Stefan Hajnoczi
  0 siblings, 0 replies; 35+ messages in thread
From: Stefan Hajnoczi @ 2017-11-14 14:20 UTC (permalink / raw)
  To: Jeff Layton
  Cc: linux-nfs, Abbas Naderi, Anna Schumaker, Trond Myklebust,
	J. Bruce Fields, Chuck Lever

[-- Attachment #1: Type: text/plain, Size: 938 bytes --]

On Tue, Nov 07, 2017 at 09:12:31AM -0500, Jeff Layton wrote:
> On Fri, 2017-06-30 at 14:23 +0100, Stefan Hajnoczi wrote:
> > @@ -1417,6 +1564,10 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
> >  	/* Initialize the socket */
> >  	if (sock->type == SOCK_DGRAM)
> >  		svc_udp_init(svsk, serv);
> > +#ifdef CONFIG_SUNRPC_XPRT_VSOCK
> > +	else if (inet->sk_family == AF_VSOCK)
> > +		svc_vsock_init(svsk, serv);
> > +#endif
> >  	else
> >  		svc_tcp_init(svsk, serv);
> 
> The above conditional is a bit of a mess and doesn't really handle the
> case where the upper layer might pass it something non-sensical (i.e.
> SOCK_DGRAM + AF_VSOCK).
> 
> I think this should vet both values and return ERR_PTR(-EINVAL) if it's
> not right. Maybe a switch statement on the address family and then check
> the sock->type in each? We can afford to code a little defensively here.

Good idea, will fix in v4.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 06/14] SUNRPC: add AF_VSOCK support to xprtsock.c
  2017-11-07 13:46   ` Jeff Layton
@ 2017-11-14 16:45     ` Stefan Hajnoczi
  0 siblings, 0 replies; 35+ messages in thread
From: Stefan Hajnoczi @ 2017-11-14 16:45 UTC (permalink / raw)
  To: Jeff Layton
  Cc: linux-nfs, Abbas Naderi, Anna Schumaker, Trond Myklebust,
	J. Bruce Fields, Chuck Lever

[-- Attachment #1: Type: text/plain, Size: 2917 bytes --]

On Tue, Nov 07, 2017 at 08:46:14AM -0500, Jeff Layton wrote:
> On Fri, 2017-06-30 at 14:23 +0100, Stefan Hajnoczi wrote:
> > +/**
> > + * xs_vsock_error_report - callback to handle vsock socket state errors
> > + * @sk: socket
> > + *
> > + * Note: we don't call sock_error() since there may be a rpc_task
> > + * using the socket, and so we don't want to clear sk->sk_err.
> > + */
> > +static void xs_vsock_error_report(struct sock *sk)
> > +{
> > +	struct rpc_xprt *xprt;
> > +	int err;
> > +
> > +	read_lock_bh(&sk->sk_callback_lock);
> > +	if (!(xprt = xprt_from_sock(sk)))
> > +		goto out;
> > +
> > +	err = -sk->sk_err;
> > +	if (err == 0)
> > +		goto out;
> > +	/* Is this a reset event? */
> > +	if (sk->sk_state == SS_UNCONNECTED)
> > +		xs_sock_mark_closed(xprt);
> > +	dprintk("RPC:       %s client %p, error=%d...\n",
> > +			__func__, xprt, -err);
> > +	trace_rpc_socket_error(xprt, sk->sk_socket, err);
> > +	xprt_wake_pending_tasks(xprt, err);
> > + out:
> > +	read_unlock_bh(&sk->sk_callback_lock);
> > +}
> 
> Hmm ok...so we have this to avoid some TCP specific stuff in
> xs_error_report, I guess?
> 
> I wonder if AF_LOCAL transport should be using the function above,
> rather than xs_error_report? If so, maybe we should rename:
> 
>     xs_error_report -> xs_tcp_error_report
>     xs_vsock_error_report -> xs_stream_error_report
> 
> Might be good to do that cleanup first as a preparatory patch.

AF_LOCAL's use of xs_error_report() is fine because AF_UNIX uses TCP_*
constants for sk->sk_state.

VSOCK has now switched to TCP_* sk_state constants in Dave Miller's
net-next tree.  I made that change for unrelated reasons, but it means
v4 of this patch series will share the xs_error_report() function with
TCP and AF_LOCAL.

> > +
> > +/**
> > + * xs_vsock_finish_connecting - initialize and connect socket
> > + */
> > +static int xs_vsock_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
> > +{
> > +	struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
> > +	int ret = -ENOTCONN;
> > +
> > +	if (!transport->inet) {
> > +		struct sock *sk = sock->sk;
> > +
> > +		write_lock_bh(&sk->sk_callback_lock);
> > +
> > +		xs_save_old_callbacks(transport, sk);
> > +
> > +		sk->sk_user_data = xprt;
> > +		sk->sk_data_ready = xs_data_ready;
> > +		sk->sk_state_change = xs_vsock_state_change;
> > +		sk->sk_write_space = xs_tcp_write_space;
> 
> Might should rename xs_tcp_write_space to xs_stream_write_space?

Yes, will fix in v4.

> > +		sk->sk_error_report = xs_vsock_error_report;
> > +		sk->sk_allocation = GFP_ATOMIC;
> 
> Why GFP_ATOMIC here? The other finish routines use GFP_NOIO.

I missed the memo on commit c2126157ea3c4f72b315749e0c07a1b162a2fe2b
("SUNRPC: Allow sockets to do GFP_NOIO allocations").  It is now okay to
use GFP_NOIO.

Thank you, will fix in v4.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 08/14] SUNRPC: add AF_VSOCK support to svc_xprt.c
  2017-11-07 14:01       ` Jeff Layton
@ 2017-11-16 15:25         ` Stefan Hajnoczi
  2017-11-16 20:53           ` Chuck Lever
  0 siblings, 1 reply; 35+ messages in thread
From: Stefan Hajnoczi @ 2017-11-16 15:25 UTC (permalink / raw)
  To: Jeff Layton
  Cc: linux-nfs, Abbas Naderi, Anna Schumaker, Trond Myklebust,
	J. Bruce Fields, Chuck Lever

[-- Attachment #1: Type: text/plain, Size: 2342 bytes --]

On Tue, Nov 07, 2017 at 09:01:26AM -0500, Jeff Layton wrote:
> On Tue, 2017-11-07 at 13:31 +0000, Stefan Hajnoczi wrote:
> > On Tue, Oct 31, 2017 at 10:10:38AM -0400, Jeff Layton wrote:
> > > On Fri, 2017-06-30 at 14:23 +0100, Stefan Hajnoczi wrote:
> > > > @@ -595,6 +609,10 @@ int svc_port_is_privileged(struct sockaddr *sin)
> > > >  	case AF_INET6:
> > > >  		return ntohs(((struct sockaddr_in6 *)sin)->sin6_port)
> > > >  			< PROT_SOCK;
> > > > +	case AF_VSOCK:
> > > > +		return ((struct sockaddr_vm *)sin)->svm_port <=
> > > > +			LAST_RESERVED_PORT;
> > > > +
> > > >  	default:
> > > >  		return 0;
> > > >  	}
> > > 
> > > Does vsock even have the concept of a privileged port? I would imagine
> > > that root in a guest VM would carry no particular significance from an
> > > export security standpoint
> > > 
> > > Since you're defining a new transport here, it might be nice write the
> > > RFCs to avoid that distinction, if possible.
> > > 
> > > Note that RDMA just has svc_port_is_privileged always return 1.
> > 
> > AF_VSOCK has the same 0-1023 privileged port range as TCP.
> > 
> 
> But why? And, given that you have 32-bits for a port with AF_VSOCK vs
> the 16 bits on an AF_INET/AF_INET6, why is the range so pitifully small?
> 
> Reserved ports are a bit of a dinosaur holdover from when being root on
> a machine on the Internet meant something. With v4.1 it's much less of
> an issue, but in the "olden days", reserved port exhaustion could be a
> real problem.
> 
> Mandating low ports can also be a problem in other way. Some well known
> services use ports in the ephemeral range, and if your service starts
> late and someone else has taken the port for an ephemeral one, you're
> out of luck.
> 
> I think we have to ask ourselves:
> 
> Should the ability to open a low port inside of a VM carry any
> significance at all to an RPC server? I'd suggest not, and I think it'd
> be good to word the RFC to make that explicitly clear.

AF_VSOCK has had the reserved port range since it was first merged in
2013.  That's before my time but I do see some use for identifying
connections coming from privileged processes.

Given that TCP has the same privileged port range, is there any reason
why AF_VSOCK would be any worse off than TCP for having it?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 08/14] SUNRPC: add AF_VSOCK support to svc_xprt.c
  2017-11-16 15:25         ` Stefan Hajnoczi
@ 2017-11-16 20:53           ` Chuck Lever
  2017-11-20 16:31             ` Stefan Hajnoczi
  2017-11-26 11:58             ` Jeff Layton
  0 siblings, 2 replies; 35+ messages in thread
From: Chuck Lever @ 2017-11-16 20:53 UTC (permalink / raw)
  To: Stefan Hajnoczi, Jeff Layton
  Cc: Linux NFS Mailing List, Abbas Naderi, Anna Schumaker,
	Trond Myklebust, Bruce Fields


> On Nov 16, 2017, at 10:25 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Tue, Nov 07, 2017 at 09:01:26AM -0500, Jeff Layton wrote:
>> On Tue, 2017-11-07 at 13:31 +0000, Stefan Hajnoczi wrote:
>>> On Tue, Oct 31, 2017 at 10:10:38AM -0400, Jeff Layton wrote:
>>>> On Fri, 2017-06-30 at 14:23 +0100, Stefan Hajnoczi wrote:
>>>>> @@ -595,6 +609,10 @@ int svc_port_is_privileged(struct sockaddr *sin)
>>>>> 	case AF_INET6:
>>>>> 		return ntohs(((struct sockaddr_in6 *)sin)->sin6_port)
>>>>> 			< PROT_SOCK;
>>>>> +	case AF_VSOCK:
>>>>> +		return ((struct sockaddr_vm *)sin)->svm_port <=
>>>>> +			LAST_RESERVED_PORT;
>>>>> +
>>>>> 	default:
>>>>> 		return 0;
>>>>> 	}
>>>> 
>>>> Does vsock even have the concept of a privileged port? I would imagine
>>>> that root in a guest VM would carry no particular significance from an
>>>> export security standpoint
>>>> 
>>>> Since you're defining a new transport here, it might be nice write the
>>>> RFCs to avoid that distinction, if possible.
>>>> 
>>>> Note that RDMA just has svc_port_is_privileged always return 1.
>>> 
>>> AF_VSOCK has the same 0-1023 privileged port range as TCP.
>>> 
>> 
>> But why? And, given that you have 32-bits for a port with AF_VSOCK vs
>> the 16 bits on an AF_INET/AF_INET6, why is the range so pitifully small?
>> 
>> Reserved ports are a bit of a dinosaur holdover from when being root on
>> a machine on the Internet meant something. With v4.1 it's much less of
>> an issue, but in the "olden days", reserved port exhaustion could be a
>> real problem.
>> 
>> Mandating low ports can also be a problem in other way. Some well known
>> services use ports in the ephemeral range, and if your service starts
>> late and someone else has taken the port for an ephemeral one, you're
>> out of luck.
>> 
>> I think we have to ask ourselves:
>> 
>> Should the ability to open a low port inside of a VM carry any
>> significance at all to an RPC server? I'd suggest not, and I think it'd
>> be good to word the RFC to make that explicitly clear.
> 
> AF_VSOCK has had the reserved port range since it was first merged in
> 2013.  That's before my time but I do see some use for identifying
> connections coming from privileged processes.
> 
> Given that TCP has the same privileged port range, is there any reason
> why AF_VSOCK would be any worse off than TCP for having it?

I agree with Jeff that we need to think carefully about this.

I don't especially care for the privileged port check, but:

In this case, you are inventing an RPC transport that makes
it impossible to use strong security (ie, RPCSEC_GSS). We
should be careful about removing the check because only
AUTH_NULL and AUTH_UNIX security can be used in this kind
of deployment.

Privileged ports are easy to spoof if there is an opportunity
for a MitM attack to alter the port number of RPCs in transit.
With VSOCK there may be no such opportunity, thus privileged
ports might provide an adequate level of security here. I
make that claim with no deep experience of VSOCK environments.

When writing the VSOCK-related RFCs, you will need to provide
a very clear and concise rationale to the IESG for purposely
not supporting the use of RPCSEC_GSS. It will start with "well,
these RPCs do not flow on open networks and are thus not
subject to MitM attacks"; then proceed to careful discussion of
how the server will guard against rogue users on guests, and
assumptions about the trust relationship between the guests
and the host. You will have to make AUTH_UNIX into a defensible
deployment choice, and port privilege might be a part of that.

Note also that the NFSv4 standards require that implementations
can support RPCSEC_GSS. NFSv4 on VSOCK cannot. Something will
have to be done about that.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 08/14] SUNRPC: add AF_VSOCK support to svc_xprt.c
  2017-11-16 20:53           ` Chuck Lever
@ 2017-11-20 16:31             ` Stefan Hajnoczi
  2017-11-26 11:58             ` Jeff Layton
  1 sibling, 0 replies; 35+ messages in thread
From: Stefan Hajnoczi @ 2017-11-20 16:31 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Jeff Layton, Linux NFS Mailing List, Abbas Naderi,
	Anna Schumaker, Trond Myklebust, Bruce Fields

[-- Attachment #1: Type: text/plain, Size: 4126 bytes --]

On Thu, Nov 16, 2017 at 03:53:04PM -0500, Chuck Lever wrote:
> 
> > On Nov 16, 2017, at 10:25 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > 
> > On Tue, Nov 07, 2017 at 09:01:26AM -0500, Jeff Layton wrote:
> >> On Tue, 2017-11-07 at 13:31 +0000, Stefan Hajnoczi wrote:
> >>> On Tue, Oct 31, 2017 at 10:10:38AM -0400, Jeff Layton wrote:
> >>>> On Fri, 2017-06-30 at 14:23 +0100, Stefan Hajnoczi wrote:
> >>>>> @@ -595,6 +609,10 @@ int svc_port_is_privileged(struct sockaddr *sin)
> >>>>> 	case AF_INET6:
> >>>>> 		return ntohs(((struct sockaddr_in6 *)sin)->sin6_port)
> >>>>> 			< PROT_SOCK;
> >>>>> +	case AF_VSOCK:
> >>>>> +		return ((struct sockaddr_vm *)sin)->svm_port <=
> >>>>> +			LAST_RESERVED_PORT;
> >>>>> +
> >>>>> 	default:
> >>>>> 		return 0;
> >>>>> 	}
> >>>> 
> >>>> Does vsock even have the concept of a privileged port? I would imagine
> >>>> that root in a guest VM would carry no particular significance from an
> >>>> export security standpoint
> >>>> 
> >>>> Since you're defining a new transport here, it might be nice write the
> >>>> RFCs to avoid that distinction, if possible.
> >>>> 
> >>>> Note that RDMA just has svc_port_is_privileged always return 1.
> >>> 
> >>> AF_VSOCK has the same 0-1023 privileged port range as TCP.
> >>> 
> >> 
> >> But why? And, given that you have 32-bits for a port with AF_VSOCK vs
> >> the 16 bits on an AF_INET/AF_INET6, why is the range so pitifully small?
> >> 
> >> Reserved ports are a bit of a dinosaur holdover from when being root on
> >> a machine on the Internet meant something. With v4.1 it's much less of
> >> an issue, but in the "olden days", reserved port exhaustion could be a
> >> real problem.
> >> 
> >> Mandating low ports can also be a problem in other way. Some well known
> >> services use ports in the ephemeral range, and if your service starts
> >> late and someone else has taken the port for an ephemeral one, you're
> >> out of luck.
> >> 
> >> I think we have to ask ourselves:
> >> 
> >> Should the ability to open a low port inside of a VM carry any
> >> significance at all to an RPC server? I'd suggest not, and I think it'd
> >> be good to word the RFC to make that explicitly clear.
> > 
> > AF_VSOCK has had the reserved port range since it was first merged in
> > 2013.  That's before my time but I do see some use for identifying
> > connections coming from privileged processes.
> > 
> > Given that TCP has the same privileged port range, is there any reason
> > why AF_VSOCK would be any worse off than TCP for having it?
> 
> I agree with Jeff that we need to think carefully about this.
> 
> I don't especially care for the privileged port check, but:
> 
> In this case, you are inventing an RPC transport that makes
> it impossible to use strong security (ie, RPCSEC_GSS). We
> should be careful about removing the check because only
> AUTH_NULL and AUTH_UNIX security can be used in this kind
> of deployment.
> 
> Privileged ports are easy to spoof if there is an opportunity
> for a MitM attack to alter the port number of RPCs in transit.
> With VSOCK there may be no such opportunity, thus privileged
> ports might provide an adequate level of security here. I
> make that claim with no deep experience of VSOCK environments.
> 
> When writing the VSOCK-related RFCs, you will need to provide
> a very clear and concise rationale to the IESG for purposely
> not supporting the use of RPCSEC_GSS. It will start with "well,
> these RPCs do not flow on open networks and are thus not
> subject to MitM attacks"; then proceed to careful discussion of
> how the server will guard against rogue users on guests, and
> assumptions about the trust relationship between the guests
> and the host. You will have to make AUTH_UNIX into a defensible
> deployment choice, and port privilege might be a part of that.
> 
> Note also that the NFSv4 standards require that implementations
> can support RPCSEC_GSS. NFSv4 on VSOCK cannot. Something will
> have to be done about that.

Thanks!  I will cover it in the draft RFC.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 08/14] SUNRPC: add AF_VSOCK support to svc_xprt.c
  2017-11-16 20:53           ` Chuck Lever
  2017-11-20 16:31             ` Stefan Hajnoczi
@ 2017-11-26 11:58             ` Jeff Layton
  2017-11-26 15:53               ` Chuck Lever
  1 sibling, 1 reply; 35+ messages in thread
From: Jeff Layton @ 2017-11-26 11:58 UTC (permalink / raw)
  To: Chuck Lever, Stefan Hajnoczi
  Cc: Linux NFS Mailing List, Abbas Naderi, Anna Schumaker,
	Trond Myklebust, Bruce Fields

On Thu, 2017-11-16 at 15:53 -0500, Chuck Lever wrote:
> > On Nov 16, 2017, at 10:25 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > 
> > On Tue, Nov 07, 2017 at 09:01:26AM -0500, Jeff Layton wrote:
> > > On Tue, 2017-11-07 at 13:31 +0000, Stefan Hajnoczi wrote:
> > > > On Tue, Oct 31, 2017 at 10:10:38AM -0400, Jeff Layton wrote:
> > > > > On Fri, 2017-06-30 at 14:23 +0100, Stefan Hajnoczi wrote:
> > > > > > @@ -595,6 +609,10 @@ int svc_port_is_privileged(struct sockaddr *sin)
> > > > > > 	case AF_INET6:
> > > > > > 		return ntohs(((struct sockaddr_in6 *)sin)->sin6_port)
> > > > > > 			< PROT_SOCK;
> > > > > > +	case AF_VSOCK:
> > > > > > +		return ((struct sockaddr_vm *)sin)->svm_port <=
> > > > > > +			LAST_RESERVED_PORT;
> > > > > > +
> > > > > > 	default:
> > > > > > 		return 0;
> > > > > > 	}
> > > > > 
> > > > > Does vsock even have the concept of a privileged port? I would imagine
> > > > > that root in a guest VM would carry no particular significance from an
> > > > > export security standpoint
> > > > > 
> > > > > Since you're defining a new transport here, it might be nice write the
> > > > > RFCs to avoid that distinction, if possible.
> > > > > 
> > > > > Note that RDMA just has svc_port_is_privileged always return 1.
> > > > 
> > > > AF_VSOCK has the same 0-1023 privileged port range as TCP.
> > > > 
> > > 
> > > But why? And, given that you have 32-bits for a port with AF_VSOCK vs
> > > the 16 bits on an AF_INET/AF_INET6, why is the range so pitifully small?
> > > 
> > > Reserved ports are a bit of a dinosaur holdover from when being root on
> > > a machine on the Internet meant something. With v4.1 it's much less of
> > > an issue, but in the "olden days", reserved port exhaustion could be a
> > > real problem.
> > > 
> > > Mandating low ports can also be a problem in other way. Some well known
> > > services use ports in the ephemeral range, and if your service starts
> > > late and someone else has taken the port for an ephemeral one, you're
> > > out of luck.
> > > 
> > > I think we have to ask ourselves:
> > > 
> > > Should the ability to open a low port inside of a VM carry any
> > > significance at all to an RPC server? I'd suggest not, and I think it'd
> > > be good to word the RFC to make that explicitly clear.
> > 
> > AF_VSOCK has had the reserved port range since it was first merged in
> > 2013.  That's before my time but I do see some use for identifying
> > connections coming from privileged processes.
> > 
> > Given that TCP has the same privileged port range, is there any reason
> > why AF_VSOCK would be any worse off than TCP for having it?
> 
> I agree with Jeff that we need to think carefully about this.
> 
> I don't especially care for the privileged port check, but:
> 
> In this case, you are inventing an RPC transport that makes
> it impossible to use strong security (ie, RPCSEC_GSS). We
> should be careful about removing the check because only
> AUTH_NULL and AUTH_UNIX security can be used in this kind
> of deployment.
> 

I know we've discussed this a bit, but does this transport _really_
preclude us from using RPCSEC_GSS? I know we don't have IP addresses
here, but hosts on either end of a vsocket will have hostnames.

WRT kerberos, I don't see a reason why both hosts couldn't communicate
with a KDC via other means, get tickets and then use those for
authenticating over their vsock connection. vsock might make it harder
to determine what SPN to use, but we could potentially work around that
in other ways.

> Privileged ports are easy to spoof if there is an opportunity
> for a MitM attack to alter the port number of RPCs in transit.
> With VSOCK there may be no such opportunity, thus privileged
> ports might provide an adequate level of security here. I
> make that claim with no deep experience of VSOCK environments.
> 
> When writing the VSOCK-related RFCs, you will need to provide
> a very clear and concise rationale to the IESG for purposely
> not supporting the use of RPCSEC_GSS. It will start with "well,
> these RPCs do not flow on open networks and are thus not
> subject to MitM attacks"; then proceed to careful discussion of
> how the server will guard against rogue users on guests, and
> assumptions about the trust relationship between the guests
> and the host. You will have to make AUTH_UNIX into a defensible
> deployment choice, and port privilege might be a part of that.
> 
> Note also that the NFSv4 standards require that implementations
> can support RPCSEC_GSS. NFSv4 on VSOCK cannot. Something will
> have to be done about that.
> 

Good point on the reserved port value for AUTH_UNIX. That may be reason
enough to keep it in.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 08/14] SUNRPC: add AF_VSOCK support to svc_xprt.c
  2017-11-26 11:58             ` Jeff Layton
@ 2017-11-26 15:53               ` Chuck Lever
  2017-11-27 16:46                 ` Bruce Fields
  0 siblings, 1 reply; 35+ messages in thread
From: Chuck Lever @ 2017-11-26 15:53 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Stefan Hajnoczi, Linux NFS Mailing List, Abbas Naderi,
	Anna Schumaker, Trond Myklebust, Bruce Fields


> On Nov 26, 2017, at 6:58 AM, Jeff Layton <jlayton@redhat.com> wrote:
> 
> On Thu, 2017-11-16 at 15:53 -0500, Chuck Lever wrote:
>>> On Nov 16, 2017, at 10:25 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>> 
>>> On Tue, Nov 07, 2017 at 09:01:26AM -0500, Jeff Layton wrote:
>>>> On Tue, 2017-11-07 at 13:31 +0000, Stefan Hajnoczi wrote:
>>>>> On Tue, Oct 31, 2017 at 10:10:38AM -0400, Jeff Layton wrote:
>>>>>> On Fri, 2017-06-30 at 14:23 +0100, Stefan Hajnoczi wrote:
>>>>>>> @@ -595,6 +609,10 @@ int svc_port_is_privileged(struct sockaddr *sin)
>>>>>>> 	case AF_INET6:
>>>>>>> 		return ntohs(((struct sockaddr_in6 *)sin)->sin6_port)
>>>>>>> 			< PROT_SOCK;
>>>>>>> +	case AF_VSOCK:
>>>>>>> +		return ((struct sockaddr_vm *)sin)->svm_port <=
>>>>>>> +			LAST_RESERVED_PORT;
>>>>>>> +
>>>>>>> 	default:
>>>>>>> 		return 0;
>>>>>>> 	}
>>>>>> 
>>>>>> Does vsock even have the concept of a privileged port? I would imagine
>>>>>> that root in a guest VM would carry no particular significance from an
>>>>>> export security standpoint
>>>>>> 
>>>>>> Since you're defining a new transport here, it might be nice write the
>>>>>> RFCs to avoid that distinction, if possible.
>>>>>> 
>>>>>> Note that RDMA just has svc_port_is_privileged always return 1.
>>>>> 
>>>>> AF_VSOCK has the same 0-1023 privileged port range as TCP.
>>>>> 
>>>> 
>>>> But why? And, given that you have 32-bits for a port with AF_VSOCK vs
>>>> the 16 bits on an AF_INET/AF_INET6, why is the range so pitifully small?
>>>> 
>>>> Reserved ports are a bit of a dinosaur holdover from when being root on
>>>> a machine on the Internet meant something. With v4.1 it's much less of
>>>> an issue, but in the "olden days", reserved port exhaustion could be a
>>>> real problem.
>>>> 
>>>> Mandating low ports can also be a problem in other way. Some well known
>>>> services use ports in the ephemeral range, and if your service starts
>>>> late and someone else has taken the port for an ephemeral one, you're
>>>> out of luck.
>>>> 
>>>> I think we have to ask ourselves:
>>>> 
>>>> Should the ability to open a low port inside of a VM carry any
>>>> significance at all to an RPC server? I'd suggest not, and I think it'd
>>>> be good to word the RFC to make that explicitly clear.
>>> 
>>> AF_VSOCK has had the reserved port range since it was first merged in
>>> 2013.  That's before my time but I do see some use for identifying
>>> connections coming from privileged processes.
>>> 
>>> Given that TCP has the same privileged port range, is there any reason
>>> why AF_VSOCK would be any worse off than TCP for having it?
>> 
>> I agree with Jeff that we need to think carefully about this.
>> 
>> I don't especially care for the privileged port check, but:
>> 
>> In this case, you are inventing an RPC transport that makes
>> it impossible to use strong security (ie, RPCSEC_GSS). We
>> should be careful about removing the check because only
>> AUTH_NULL and AUTH_UNIX security can be used in this kind
>> of deployment.
>> 
> 
> I know we've discussed this a bit, but does this transport _really_
> preclude us from using RPCSEC_GSS? I know we don't have IP addresses
> here, but hosts on either end of a vsocket will have hostnames.

Yes, even for AUTH_UNIX, something has to go in the "hostname"
field in the credential. Let's say the guest's uname.


> WRT kerberos, I don't see a reason why both hosts couldn't communicate
> with a KDC via other means, get tickets and then use those for
> authenticating over their vsock connection. vsock might make it harder
> to determine what SPN to use, but we could potentially work around that
> in other ways.

"No network configuration" implies to me that the KDC (or
a proxy for it) would have to reside on the host.

Indeed, there's no a priori way of determining the NFS
server SPN, other than to tell the guest somehow via a
manual administrative interface or we pick a well-known
value for it and make it part of the RPC-over-VSOCK spec.


>> Privileged ports are easy to spoof if there is an opportunity
>> for a MitM attack to alter the port number of RPCs in transit.
>> With VSOCK there may be no such opportunity, thus privileged
>> ports might provide an adequate level of security here. I
>> make that claim with no deep experience of VSOCK environments.
>> 
>> When writing the VSOCK-related RFCs, you will need to provide
>> a very clear and concise rationale to the IESG for purposely
>> not supporting the use of RPCSEC_GSS. It will start with "well,
>> these RPCs do not flow on open networks and are thus not
>> subject to MitM attacks"; then proceed to careful discussion of
>> how the server will guard against rogue users on guests, and
>> assumptions about the trust relationship between the guests
>> and the host. You will have to make AUTH_UNIX into a defensible
>> deployment choice, and port privilege might be a part of that.
>> 
>> Note also that the NFSv4 standards require that implementations
>> can support RPCSEC_GSS. NFSv4 on VSOCK cannot. Something will
>> have to be done about that.
>> 
> 
> Good point on the reserved port value for AUTH_UNIX. That may be reason
> enough to keep it in.

--
Chuck Lever




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 08/14] SUNRPC: add AF_VSOCK support to svc_xprt.c
  2017-11-26 15:53               ` Chuck Lever
@ 2017-11-27 16:46                 ` Bruce Fields
  2017-11-27 17:34                   ` Jeff Layton
  0 siblings, 1 reply; 35+ messages in thread
From: Bruce Fields @ 2017-11-27 16:46 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Jeff Layton, Stefan Hajnoczi, Linux NFS Mailing List,
	Abbas Naderi, Anna Schumaker, Trond Myklebust

On Sun, Nov 26, 2017 at 10:53:45AM -0500, Chuck Lever wrote:
> 
> > On Nov 26, 2017, at 6:58 AM, Jeff Layton <jlayton@redhat.com> wrote:
> > 
> > On Thu, 2017-11-16 at 15:53 -0500, Chuck Lever wrote:
> >>> On Nov 16, 2017, at 10:25 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >>> 
> >>> On Tue, Nov 07, 2017 at 09:01:26AM -0500, Jeff Layton wrote:
> >>>> On Tue, 2017-11-07 at 13:31 +0000, Stefan Hajnoczi wrote:
> >>>>> On Tue, Oct 31, 2017 at 10:10:38AM -0400, Jeff Layton wrote:
> >>>>>> On Fri, 2017-06-30 at 14:23 +0100, Stefan Hajnoczi wrote:
> >>>>>>> @@ -595,6 +609,10 @@ int svc_port_is_privileged(struct sockaddr *sin)
> >>>>>>> 	case AF_INET6:
> >>>>>>> 		return ntohs(((struct sockaddr_in6 *)sin)->sin6_port)
> >>>>>>> 			< PROT_SOCK;
> >>>>>>> +	case AF_VSOCK:
> >>>>>>> +		return ((struct sockaddr_vm *)sin)->svm_port <=
> >>>>>>> +			LAST_RESERVED_PORT;
> >>>>>>> +
> >>>>>>> 	default:
> >>>>>>> 		return 0;
> >>>>>>> 	}
> >>>>>> 
> >>>>>> Does vsock even have the concept of a privileged port? I would imagine
> >>>>>> that root in a guest VM would carry no particular significance from an
> >>>>>> export security standpoint
> >>>>>> 
> >>>>>> Since you're defining a new transport here, it might be nice write the
> >>>>>> RFCs to avoid that distinction, if possible.
> >>>>>> 
> >>>>>> Note that RDMA just has svc_port_is_privileged always return 1.
> >>>>> 
> >>>>> AF_VSOCK has the same 0-1023 privileged port range as TCP.
> >>>>> 
> >>>> 
> >>>> But why? And, given that you have 32-bits for a port with AF_VSOCK vs
> >>>> the 16 bits on an AF_INET/AF_INET6, why is the range so pitifully small?
> >>>> 
> >>>> Reserved ports are a bit of a dinosaur holdover from when being root on
> >>>> a machine on the Internet meant something. With v4.1 it's much less of
> >>>> an issue, but in the "olden days", reserved port exhaustion could be a
> >>>> real problem.
> >>>> 
> >>>> Mandating low ports can also be a problem in other way. Some well known
> >>>> services use ports in the ephemeral range, and if your service starts
> >>>> late and someone else has taken the port for an ephemeral one, you're
> >>>> out of luck.
> >>>> 
> >>>> I think we have to ask ourselves:
> >>>> 
> >>>> Should the ability to open a low port inside of a VM carry any
> >>>> significance at all to an RPC server? I'd suggest not, and I think it'd
> >>>> be good to word the RFC to make that explicitly clear.
> >>> 
> >>> AF_VSOCK has had the reserved port range since it was first merged in
> >>> 2013.  That's before my time but I do see some use for identifying
> >>> connections coming from privileged processes.
> >>> 
> >>> Given that TCP has the same privileged port range, is there any reason
> >>> why AF_VSOCK would be any worse off than TCP for having it?
> >> 
> >> I agree with Jeff that we need to think carefully about this.
> >> 
> >> I don't especially care for the privileged port check, but:
> >> 
> >> In this case, you are inventing an RPC transport that makes
> >> it impossible to use strong security (ie, RPCSEC_GSS). We
> >> should be careful about removing the check because only
> >> AUTH_NULL and AUTH_UNIX security can be used in this kind
> >> of deployment.
> >> 
> > 
> > I know we've discussed this a bit, but does this transport _really_
> > preclude us from using RPCSEC_GSS? I know we don't have IP addresses
> > here, but hosts on either end of a vsocket will have hostnames.
> 
> Yes, even for AUTH_UNIX, something has to go in the "hostname"
> field in the credential. Let's say the guest's uname.
> 
> 
> > WRT kerberos, I don't see a reason why both hosts couldn't communicate
> > with a KDC via other means, get tickets and then use those for
> > authenticating over their vsock connection. vsock might make it harder
> > to determine what SPN to use, but we could potentially work around that
> > in other ways.
> 
> "No network configuration" implies to me that the KDC (or
> a proxy for it) would have to reside on the host.

Their requirement is that network configuration not be mandatory, not
that it always be absent.

Then again maybe rpcsec_gss/vsock loses any advantage over
rpcsec_gss/tcp if the former always requires a network anyway.

> >> Note also that the NFSv4 standards require that implementations
> >> can support RPCSEC_GSS. NFSv4 on VSOCK cannot. Something will
> >> have to be done about that.

Which might be: "Make an argument for why that requirement produces no
useful result in this case". ?

--b.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 08/14] SUNRPC: add AF_VSOCK support to svc_xprt.c
  2017-11-27 16:46                 ` Bruce Fields
@ 2017-11-27 17:34                   ` Jeff Layton
  2017-11-27 17:37                     ` Matt Benjamin
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff Layton @ 2017-11-27 17:34 UTC (permalink / raw)
  To: Bruce Fields, Chuck Lever
  Cc: Stefan Hajnoczi, Linux NFS Mailing List, Abbas Naderi,
	Anna Schumaker, Trond Myklebust

On Mon, 2017-11-27 at 11:46 -0500, Bruce Fields wrote:
> On Sun, Nov 26, 2017 at 10:53:45AM -0500, Chuck Lever wrote:
> > 
> > > On Nov 26, 2017, at 6:58 AM, Jeff Layton <jlayton@redhat.com> wrote:
> > > 
> > > On Thu, 2017-11-16 at 15:53 -0500, Chuck Lever wrote:
> > > > > On Nov 16, 2017, at 10:25 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > > > 
> > > > > On Tue, Nov 07, 2017 at 09:01:26AM -0500, Jeff Layton wrote:
> > > > > > On Tue, 2017-11-07 at 13:31 +0000, Stefan Hajnoczi wrote:
> > > > > > > On Tue, Oct 31, 2017 at 10:10:38AM -0400, Jeff Layton wrote:
> > > > > > > > On Fri, 2017-06-30 at 14:23 +0100, Stefan Hajnoczi wrote:
> > > > > > > > > @@ -595,6 +609,10 @@ int svc_port_is_privileged(struct sockaddr *sin)
> > > > > > > > > 	case AF_INET6:
> > > > > > > > > 		return ntohs(((struct sockaddr_in6 *)sin)->sin6_port)
> > > > > > > > > 			< PROT_SOCK;
> > > > > > > > > +	case AF_VSOCK:
> > > > > > > > > +		return ((struct sockaddr_vm *)sin)->svm_port <=
> > > > > > > > > +			LAST_RESERVED_PORT;
> > > > > > > > > +
> > > > > > > > > 	default:
> > > > > > > > > 		return 0;
> > > > > > > > > 	}
> > > > > > > > 
> > > > > > > > Does vsock even have the concept of a privileged port? I would imagine
> > > > > > > > that root in a guest VM would carry no particular significance from an
> > > > > > > > export security standpoint
> > > > > > > > 
> > > > > > > > Since you're defining a new transport here, it might be nice write the
> > > > > > > > RFCs to avoid that distinction, if possible.
> > > > > > > > 
> > > > > > > > Note that RDMA just has svc_port_is_privileged always return 1.
> > > > > > > 
> > > > > > > AF_VSOCK has the same 0-1023 privileged port range as TCP.
> > > > > > > 
> > > > > > 
> > > > > > But why? And, given that you have 32-bits for a port with AF_VSOCK vs
> > > > > > the 16 bits on an AF_INET/AF_INET6, why is the range so pitifully small?
> > > > > > 
> > > > > > Reserved ports are a bit of a dinosaur holdover from when being root on
> > > > > > a machine on the Internet meant something. With v4.1 it's much less of
> > > > > > an issue, but in the "olden days", reserved port exhaustion could be a
> > > > > > real problem.
> > > > > > 
> > > > > > Mandating low ports can also be a problem in other way. Some well known
> > > > > > services use ports in the ephemeral range, and if your service starts
> > > > > > late and someone else has taken the port for an ephemeral one, you're
> > > > > > out of luck.
> > > > > > 
> > > > > > I think we have to ask ourselves:
> > > > > > 
> > > > > > Should the ability to open a low port inside of a VM carry any
> > > > > > significance at all to an RPC server? I'd suggest not, and I think it'd
> > > > > > be good to word the RFC to make that explicitly clear.
> > > > > 
> > > > > AF_VSOCK has had the reserved port range since it was first merged in
> > > > > 2013.  That's before my time but I do see some use for identifying
> > > > > connections coming from privileged processes.
> > > > > 
> > > > > Given that TCP has the same privileged port range, is there any reason
> > > > > why AF_VSOCK would be any worse off than TCP for having it?
> > > > 
> > > > I agree with Jeff that we need to think carefully about this.
> > > > 
> > > > I don't especially care for the privileged port check, but:
> > > > 
> > > > In this case, you are inventing an RPC transport that makes
> > > > it impossible to use strong security (ie, RPCSEC_GSS). We
> > > > should be careful about removing the check because only
> > > > AUTH_NULL and AUTH_UNIX security can be used in this kind
> > > > of deployment.
> > > > 
> > > 
> > > I know we've discussed this a bit, but does this transport _really_
> > > preclude us from using RPCSEC_GSS? I know we don't have IP addresses
> > > here, but hosts on either end of a vsocket will have hostnames.
> > 
> > Yes, even for AUTH_UNIX, something has to go in the "hostname"
> > field in the credential. Let's say the guest's uname.
> > 
> > 
> > > WRT kerberos, I don't see a reason why both hosts couldn't communicate
> > > with a KDC via other means, get tickets and then use those for
> > > authenticating over their vsock connection. vsock might make it harder
> > > to determine what SPN to use, but we could potentially work around that
> > > in other ways.
> > 
> > "No network configuration" implies to me that the KDC (or
> > a proxy for it) would have to reside on the host.

A proxy would be fine. The whole point of krb5 is that you can't rely on
 the network anyway...

> 
> Their requirement is that network configuration not be mandatory, not
> that it always be absent.
> 
> Then again maybe rpcsec_gss/vsock loses any advantage over
> rpcsec_gss/tcp if the former always requires a network anyway.
> 
> > > > Note also that the NFSv4 standards require that implementations
> > > > can support RPCSEC_GSS. NFSv4 on VSOCK cannot. Something will
> > > > have to be done about that.
> 
> Which might be: "Make an argument for why that requirement produces no
> useful result in this case". ?


I guess it just seems like we can allow RPCSEC_GSS over this channel,
even if it's not terribly useful today. I don't think we ought to word
in a way that specifically forbids it, unless it really does fall short
of some key requirement.

GSSAPI is more than just krb5 too...maybe LIPKEY will make a comeback
someday? :)

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 08/14] SUNRPC: add AF_VSOCK support to svc_xprt.c
  2017-11-27 17:34                   ` Jeff Layton
@ 2017-11-27 17:37                     ` Matt Benjamin
  0 siblings, 0 replies; 35+ messages in thread
From: Matt Benjamin @ 2017-11-27 17:37 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Bruce Fields, Chuck Lever, Stefan Hajnoczi,
	Linux NFS Mailing List, Abbas Naderi, Anna Schumaker,
	Trond Myklebust

Yes, we should allow for the possibility of GSS mechanisms other than krb5.

Matt

On Mon, Nov 27, 2017 at 12:34 PM, Jeff Layton <jlayton@redhat.com> wrote:
> On Mon, 2017-11-27 at 11:46 -0500, Bruce Fields wrote:
>> On Sun, Nov 26, 2017 at 10:53:45AM -0500, Chuck Lever wrote:
>> >
>> > > On Nov 26, 2017, at 6:58 AM, Jeff Layton <jlayton@redhat.com> wrote:
>> > >
>> > > On Thu, 2017-11-16 at 15:53 -0500, Chuck Lever wrote:
>> > > > > On Nov 16, 2017, at 10:25 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>> > > > >
>> > > > > On Tue, Nov 07, 2017 at 09:01:26AM -0500, Jeff Layton wrote:
>> > > > > > On Tue, 2017-11-07 at 13:31 +0000, Stefan Hajnoczi wrote:
>> > > > > > > On Tue, Oct 31, 2017 at 10:10:38AM -0400, Jeff Layton wrote:
>> > > > > > > > On Fri, 2017-06-30 at 14:23 +0100, Stefan Hajnoczi wrote:
>> > > > > > > > > @@ -595,6 +609,10 @@ int svc_port_is_privileged(struct sockaddr *sin)
>> > > > > > > > >       case AF_INET6:
>> > > > > > > > >               return ntohs(((struct sockaddr_in6 *)sin)->sin6_port)
>> > > > > > > > >                       < PROT_SOCK;
>> > > > > > > > > +     case AF_VSOCK:
>> > > > > > > > > +             return ((struct sockaddr_vm *)sin)->svm_port <=
>> > > > > > > > > +                     LAST_RESERVED_PORT;
>> > > > > > > > > +
>> > > > > > > > >       default:
>> > > > > > > > >               return 0;
>> > > > > > > > >       }
>> > > > > > > >
>> > > > > > > > Does vsock even have the concept of a privileged port? I would imagine
>> > > > > > > > that root in a guest VM would carry no particular significance from an
>> > > > > > > > export security standpoint
>> > > > > > > >
>> > > > > > > > Since you're defining a new transport here, it might be nice write the
>> > > > > > > > RFCs to avoid that distinction, if possible.
>> > > > > > > >
>> > > > > > > > Note that RDMA just has svc_port_is_privileged always return 1.
>> > > > > > >
>> > > > > > > AF_VSOCK has the same 0-1023 privileged port range as TCP.
>> > > > > > >
>> > > > > >
>> > > > > > But why? And, given that you have 32-bits for a port with AF_VSOCK vs
>> > > > > > the 16 bits on an AF_INET/AF_INET6, why is the range so pitifully small?
>> > > > > >
>> > > > > > Reserved ports are a bit of a dinosaur holdover from when being root on
>> > > > > > a machine on the Internet meant something. With v4.1 it's much less of
>> > > > > > an issue, but in the "olden days", reserved port exhaustion could be a
>> > > > > > real problem.
>> > > > > >
>> > > > > > Mandating low ports can also be a problem in other way. Some well known
>> > > > > > services use ports in the ephemeral range, and if your service starts
>> > > > > > late and someone else has taken the port for an ephemeral one, you're
>> > > > > > out of luck.
>> > > > > >
>> > > > > > I think we have to ask ourselves:
>> > > > > >
>> > > > > > Should the ability to open a low port inside of a VM carry any
>> > > > > > significance at all to an RPC server? I'd suggest not, and I think it'd
>> > > > > > be good to word the RFC to make that explicitly clear.
>> > > > >
>> > > > > AF_VSOCK has had the reserved port range since it was first merged in
>> > > > > 2013.  That's before my time but I do see some use for identifying
>> > > > > connections coming from privileged processes.
>> > > > >
>> > > > > Given that TCP has the same privileged port range, is there any reason
>> > > > > why AF_VSOCK would be any worse off than TCP for having it?
>> > > >
>> > > > I agree with Jeff that we need to think carefully about this.
>> > > >
>> > > > I don't especially care for the privileged port check, but:
>> > > >
>> > > > In this case, you are inventing an RPC transport that makes
>> > > > it impossible to use strong security (ie, RPCSEC_GSS). We
>> > > > should be careful about removing the check because only
>> > > > AUTH_NULL and AUTH_UNIX security can be used in this kind
>> > > > of deployment.
>> > > >
>> > >
>> > > I know we've discussed this a bit, but does this transport _really_
>> > > preclude us from using RPCSEC_GSS? I know we don't have IP addresses
>> > > here, but hosts on either end of a vsocket will have hostnames.
>> >
>> > Yes, even for AUTH_UNIX, something has to go in the "hostname"
>> > field in the credential. Let's say the guest's uname.
>> >
>> >
>> > > WRT kerberos, I don't see a reason why both hosts couldn't communicate
>> > > with a KDC via other means, get tickets and then use those for
>> > > authenticating over their vsock connection. vsock might make it harder
>> > > to determine what SPN to use, but we could potentially work around that
>> > > in other ways.
>> >
>> > "No network configuration" implies to me that the KDC (or
>> > a proxy for it) would have to reside on the host.
>
> A proxy would be fine. The whole point of krb5 is that you can't rely on
>  the network anyway...
>
>>
>> Their requirement is that network configuration not be mandatory, not
>> that it always be absent.
>>
>> Then again maybe rpcsec_gss/vsock loses any advantage over
>> rpcsec_gss/tcp if the former always requires a network anyway.
>>
>> > > > Note also that the NFSv4 standards require that implementations
>> > > > can support RPCSEC_GSS. NFSv4 on VSOCK cannot. Something will
>> > > > have to be done about that.
>>
>> Which might be: "Make an argument for why that requirement produces no
>> useful result in this case". ?
>
>
> I guess it just seems like we can allow RPCSEC_GSS over this channel,
> even if it's not terribly useful today. I don't think we ought to word
> in a way that specifically forbids it, unless it really does fall short
> of some key requirement.
>
> GSSAPI is more than just krb5 too...maybe LIPKEY will make a comeback
> someday? :)
>
> --
> Jeff Layton <jlayton@redhat.com>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2017-11-27 17:37 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-30 13:23 [PATCH v3 00/14] NFS: add AF_VSOCK support Stefan Hajnoczi
2017-06-30 13:23 ` [PATCH v3 01/14] SUNRPC: add AF_VSOCK support to addr.[ch] Stefan Hajnoczi
2017-06-30 13:23 ` [PATCH v3 02/14] SUNRPC: rename "TCP" record parser to "stream" parser Stefan Hajnoczi
2017-06-30 13:23 ` [PATCH v3 03/14] SUNRPC: abstract tcp_read_sock() in record fragment parser Stefan Hajnoczi
2017-06-30 13:23 ` [PATCH v3 04/14] SUNRPC: extract xs_stream_reset_state() Stefan Hajnoczi
2017-06-30 13:23 ` [PATCH v3 05/14] VSOCK: add tcp_read_sock()-like vsock_read_sock() function Stefan Hajnoczi
2017-10-31 13:35   ` Jeff Layton
2017-11-07 13:32     ` Stefan Hajnoczi
2017-06-30 13:23 ` [PATCH v3 06/14] SUNRPC: add AF_VSOCK support to xprtsock.c Stefan Hajnoczi
2017-11-07 13:46   ` Jeff Layton
2017-11-14 16:45     ` Stefan Hajnoczi
2017-06-30 13:23 ` [PATCH v3 07/14] SUNRPC: drop unnecessary svc_bc_tcp_create() helper Stefan Hajnoczi
2017-10-31 13:55   ` Jeff Layton
2017-06-30 13:23 ` [PATCH v3 08/14] SUNRPC: add AF_VSOCK support to svc_xprt.c Stefan Hajnoczi
2017-10-31 14:10   ` Jeff Layton
2017-11-07 13:31     ` Stefan Hajnoczi
2017-11-07 14:01       ` Jeff Layton
2017-11-16 15:25         ` Stefan Hajnoczi
2017-11-16 20:53           ` Chuck Lever
2017-11-20 16:31             ` Stefan Hajnoczi
2017-11-26 11:58             ` Jeff Layton
2017-11-26 15:53               ` Chuck Lever
2017-11-27 16:46                 ` Bruce Fields
2017-11-27 17:34                   ` Jeff Layton
2017-11-27 17:37                     ` Matt Benjamin
2017-06-30 13:23 ` [PATCH v3 09/14] SUNRPC: add AF_VSOCK backchannel support Stefan Hajnoczi
2017-06-30 13:23 ` [PATCH v3 10/14] NFS: add AF_VSOCK support to NFS client Stefan Hajnoczi
2017-06-30 13:23 ` [PATCH v3 11/14] nfsd: support vsock xprt creation Stefan Hajnoczi
2017-06-30 13:23 ` [PATCH v3 12/14] SUNRPC: add AF_VSOCK lock class Stefan Hajnoczi
2017-06-30 13:23 ` [PATCH v3 13/14] SUNRPC: vsock svcsock support Stefan Hajnoczi
2017-11-07 14:12   ` Jeff Layton
2017-11-14 14:20     ` Stefan Hajnoczi
2017-06-30 13:23 ` [PATCH v3 14/14] SUNRPC: add AF_VSOCK support to auth.unix.ip Stefan Hajnoczi
2017-07-06 18:46   ` Abbas Naderi
2017-07-10 18:05     ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.