netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v4 0/5] Add virtio transport for AF_VSOCK
@ 2015-12-22  9:07 Stefan Hajnoczi
  2015-12-22  9:07 ` [RFC v4 1/5] VSOCK: transport-specific vsock_transport functions Stefan Hajnoczi
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: Stefan Hajnoczi @ 2015-12-22  9:07 UTC (permalink / raw)
  To: kvm
  Cc: Stefan Hajnoczi, Michael S. Tsirkin, netdev, virtualization,
	Matt Benjamin, Christoffer Dall, matt.ma

This series is based on v4.4-rc2 and the "virtio: make find_vqs()
checkpatch.pl-friendly" patch I recently submitted.

v4:
 * Addressed code review comments from Alex Bennee
 * MAINTAINERS file entries for new files
 * Trace events instead of pr_debug()
 * RST packet is sent when there is no listen socket
 * Allow guest->host connections again (began discussing netfilter support with
   Matt Benjamin instead of hard-coding security policy in virtio-vsock code)
 * Many checkpatch.pl cleanups (will be 100% clean in v5)

v3:
 * Remove unnecessary 3-way handshake, just do REQUEST/RESPONSE instead
   of REQUEST/RESPONSE/ACK
 * Remove SOCK_DGRAM support and focus on SOCK_STREAM first
   (also drop v2 Patch 1, it's only needed for SOCK_DGRAM)
 * Only allow host->guest connections (same security model as latest
   VMware)
 * Don't put vhost vsock driver into staging
 * Add missing Kconfig dependencies (Arnd Bergmann <arnd@arndb.de>)
 * Remove unneeded variable used to store return value
   (Fengguang Wu <fengguang.wu@intel.com> and Julia Lawall
   <julia.lawall@lip6.fr>)

v2:
 * Rebased onto Linux v4.4-rc2
 * vhost: Refuse to assign reserved CIDs
 * vhost: Refuse guest CID if already in use
 * vhost: Only accept correctly addressed packets (no spoofing!)
 * vhost: Support flexible rx/tx descriptor layout
 * vhost: Add missing total_tx_buf decrement
 * virtio_transport: Fix total_tx_buf accounting
 * virtio_transport: Add virtio_transport global mutex to prevent races
 * common: Notify other side of SOCK_STREAM disconnect (fixes shutdown
   semantics)
 * common: Avoid recursive mutex_lock(tx_lock) for write_space (fixes deadlock)
 * common: Define VIRTIO_VSOCK_TYPE_STREAM/DGRAM hardware interface constants
 * common: Define VIRTIO_VSOCK_SHUTDOWN_RCV/SEND hardware interface constants
 * common: Fix peer_buf_alloc inheritance on child socket

This patch series adds a virtio transport for AF_VSOCK (net/vmw_vsock/).
AF_VSOCK is designed for communication between virtual machines and
hypervisors.  It is currently only implemented for VMware's VMCI transport.

This series implements the proposed virtio-vsock device specification from
here:
http://permalink.gmane.org/gmane.comp.emulators.virtio.devel/980

Most of the work was done by Asias He and Gerd Hoffmann a while back.  I have
picked up the series again.

The QEMU userspace changes are here:
https://github.com/stefanha/qemu/commits/vsock

Why virtio-vsock?
-----------------
Guest<->host communication is currently done over the virtio-serial device.
This makes it hard to port sockets API-based applications and is limited to
static ports.

virtio-vsock uses the sockets API so that applications can rely on familiar
SOCK_STREAM semantics.  Applications on the host can easily connect to guest
agents because the sockets API allows multiple connections to a listen socket
(unlike virtio-serial).  This simplifies the guest<->host communication and
eliminates the need for extra processes on the host to arbitrate virtio-serial
ports.

Overview
--------
This series adds 3 pieces:

1. virtio_transport_common.ko - core virtio vsock code that uses vsock.ko

2. virtio_transport.ko - guest driver

3. drivers/vhost/vsock.ko - host driver

Howto
-----
The following kernel options are needed:
  CONFIG_VSOCKETS=y
  CONFIG_VIRTIO_VSOCKETS=y
  CONFIG_VIRTIO_VSOCKETS_COMMON=y
  CONFIG_VHOST_VSOCK=m

Launch QEMU as follows:
  # qemu ... -device vhost-vsock-pci,id=vhost-vsock-pci0,guest-cid=3

Guest and host can communicate via AF_VSOCK sockets.  The host's CID (address)
is 2 and the guest must be assigned a CID (3 in the example above).

Status
------
This patch series implements the latest draft specification.  Please review.

Asias He (4):
  VSOCK: Introduce virtio_vsock_common.ko
  VSOCK: Introduce virtio_transport.ko
  VSOCK: Introduce vhost_vsock.ko
  VSOCK: Add Makefile and Kconfig

Stefan Hajnoczi (1):
  VSOCK: transport-specific vsock_transport functions

 MAINTAINERS                                        |  13 +
 drivers/vhost/Kconfig                              |  15 +
 drivers/vhost/Makefile                             |   4 +
 drivers/vhost/vsock.c                              | 607 +++++++++++++++
 drivers/vhost/vsock.h                              |   4 +
 include/linux/virtio_vsock.h                       | 167 +++++
 include/net/af_vsock.h                             |   3 +
 .../trace/events/vsock_virtio_transport_common.h   | 144 ++++
 include/uapi/linux/virtio_ids.h                    |   1 +
 include/uapi/linux/virtio_vsock.h                  |  87 +++
 net/vmw_vsock/Kconfig                              |  19 +
 net/vmw_vsock/Makefile                             |   2 +
 net/vmw_vsock/af_vsock.c                           |   9 +
 net/vmw_vsock/virtio_transport.c                   | 481 ++++++++++++
 net/vmw_vsock/virtio_transport_common.c            | 834 +++++++++++++++++++++
 15 files changed, 2390 insertions(+)
 create mode 100644 drivers/vhost/vsock.c
 create mode 100644 drivers/vhost/vsock.h
 create mode 100644 include/linux/virtio_vsock.h
 create mode 100644 include/trace/events/vsock_virtio_transport_common.h
 create mode 100644 include/uapi/linux/virtio_vsock.h
 create mode 100644 net/vmw_vsock/virtio_transport.c
 create mode 100644 net/vmw_vsock/virtio_transport_common.c

-- 
2.5.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC v4 1/5] VSOCK: transport-specific vsock_transport functions
  2015-12-22  9:07 [RFC v4 0/5] Add virtio transport for AF_VSOCK Stefan Hajnoczi
@ 2015-12-22  9:07 ` Stefan Hajnoczi
  2015-12-22  9:07 ` [RFC v4 2/5] VSOCK: Introduce virtio_vsock_common.ko Stefan Hajnoczi
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Stefan Hajnoczi @ 2015-12-22  9:07 UTC (permalink / raw)
  To: kvm
  Cc: Matt Benjamin, Christoffer Dall, netdev, Michael S. Tsirkin,
	matt.ma, virtualization, Stefan Hajnoczi

struct vsock_transport contains function pointers called by AF_VSOCK
core code.  The transport may want its own transport-specific function
pointers and they can be added after struct vsock_transport.

Allow the transport to fetch vsock_transport.  It can downcast it to
access transport-specific function pointers.

The virtio transport will use this.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/net/af_vsock.h   | 3 +++
 net/vmw_vsock/af_vsock.c | 9 +++++++++
 2 files changed, 12 insertions(+)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index e9eb2d6..23f5525 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -165,6 +165,9 @@ static inline int vsock_core_init(const struct vsock_transport *t)
 }
 void vsock_core_exit(void);
 
+/* The transport may downcast this to access transport-specific functions */
+const struct vsock_transport *vsock_core_get_transport(void);
+
 /**** UTILS ****/
 
 void vsock_release_pending(struct sock *pending);
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 7fd1220..9783a38 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1994,6 +1994,15 @@ void vsock_core_exit(void)
 }
 EXPORT_SYMBOL_GPL(vsock_core_exit);
 
+const struct vsock_transport *vsock_core_get_transport(void)
+{
+	/* vsock_register_mutex not taken since only the transport uses this
+	 * function and only while registered.
+	 */
+	return transport;
+}
+EXPORT_SYMBOL_GPL(vsock_core_get_transport);
+
 MODULE_AUTHOR("VMware, Inc.");
 MODULE_DESCRIPTION("VMware Virtual Socket Family");
 MODULE_VERSION("1.0.1.0-k");
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC v4 2/5] VSOCK: Introduce virtio_vsock_common.ko
  2015-12-22  9:07 [RFC v4 0/5] Add virtio transport for AF_VSOCK Stefan Hajnoczi
  2015-12-22  9:07 ` [RFC v4 1/5] VSOCK: transport-specific vsock_transport functions Stefan Hajnoczi
@ 2015-12-22  9:07 ` Stefan Hajnoczi
  2015-12-22  9:07 ` [RFC v4 3/5] VSOCK: Introduce virtio_transport.ko Stefan Hajnoczi
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Stefan Hajnoczi @ 2015-12-22  9:07 UTC (permalink / raw)
  To: kvm
  Cc: Stefan Hajnoczi, Michael S. Tsirkin, netdev, virtualization,
	Matt Benjamin, Asias He, Christoffer Dall, matt.ma

From: Asias He <asias@redhat.com>

This module contains the common code and header files for the following
virtio_transporto and vhost_vsock kernel modules.

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
v4:
 * Add MAINTAINERS file entry
 * checkpatch.pl cleanups
 * linux_vsock.h: drop wrong copy-pasted license header
 * Move tx sock refcounting to virtio_transport_alloc/free_pkt() to fix
   leaks in error paths
 * Add send_pkt_no_sock() to send RST packets with no listen socket
 * Rename per-socket state from virtio_transport to virtio_vsock_sock
 * Move send_pkt_ops to new virtio_transport struct
 * Drop dumppkt function, packet capture will be added in the future
 * Drop empty virtio_transport_dec_tx_pkt()
 * Allow guest->host connections again
 * Use trace events instead of pr_debug()
v3:
 * Remove unnecessary 3-way handshake, just do REQUEST/RESPONSE instead
   of REQUEST/RESPONSE/ACK
 * Remove SOCK_DGRAM support and focus on SOCK_STREAM first
 * Only allow host->guest connections (same security model as latest
   VMware)
v2:
 * Fix peer_buf_alloc inheritance on child socket
 * Notify other side of SOCK_STREAM disconnect (fixes shutdown
   semantics)
 * Avoid recursive mutex_lock(tx_lock) for write_space (fixes deadlock)
 * Define VIRTIO_VSOCK_TYPE_STREAM/DGRAM hardware interface constants
 * Define VIRTIO_VSOCK_SHUTDOWN_RCV/SEND hardware interface constants
---
 MAINTAINERS                                        |  10 +
 include/linux/virtio_vsock.h                       | 167 +++++
 .../trace/events/vsock_virtio_transport_common.h   | 144 ++++
 include/uapi/linux/virtio_ids.h                    |   1 +
 include/uapi/linux/virtio_vsock.h                  |  87 +++
 net/vmw_vsock/virtio_transport_common.c            | 834 +++++++++++++++++++++
 6 files changed, 1243 insertions(+)
 create mode 100644 include/linux/virtio_vsock.h
 create mode 100644 include/trace/events/vsock_virtio_transport_common.h
 create mode 100644 include/uapi/linux/virtio_vsock.h
 create mode 100644 net/vmw_vsock/virtio_transport_common.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 050d0e7..d42db78 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11360,6 +11360,16 @@ S:	Maintained
 F:	drivers/media/v4l2-core/videobuf2-*
 F:	include/media/videobuf2-*
 
+VIRTIO AND VHOST VSOCK DRIVER
+M:	Stefan Hajnoczi <stefanha@redhat.com>
+L:	kvm@vger.kernel.org
+L:	virtualization@lists.linux-foundation.org
+L:	netdev@vger.kernel.org
+S:	Maintained
+F:	include/linux/virtio_vsock.h
+F:	include/uapi/linux/virtio_vsock.h
+F:	net/vmw_vsock/virtio_transport_common.c
+
 VIRTUAL SERIO DEVICE DRIVER
 M:	Stephen Chandler Paul <thatslyude@gmail.com>
 S:	Maintained
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
new file mode 100644
index 0000000..4acf1ad
--- /dev/null
+++ b/include/linux/virtio_vsock.h
@@ -0,0 +1,167 @@
+#ifndef _LINUX_VIRTIO_VSOCK_H
+#define _LINUX_VIRTIO_VSOCK_H
+
+#include <uapi/linux/virtio_vsock.h>
+#include <linux/socket.h>
+#include <net/sock.h>
+#include <net/af_vsock.h>
+
+#define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE	128
+#define VIRTIO_VSOCK_DEFAULT_BUF_SIZE		(1024 * 256)
+#define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE	(1024 * 256)
+#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 4)
+#define VIRTIO_VSOCK_MAX_BUF_SIZE		0xFFFFFFFFUL
+#define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE		(1024 * 64)
+#define VIRTIO_VSOCK_MAX_TX_BUF_SIZE		(1024 * 1024 * 16)
+#define VIRTIO_VSOCK_MAX_DGRAM_SIZE		(1024 * 64)
+
+enum {
+	VSOCK_VQ_CTRL	= 0,
+	VSOCK_VQ_RX	= 1, /* for host to guest data */
+	VSOCK_VQ_TX	= 2, /* for guest to host data */
+	VSOCK_VQ_MAX	= 3,
+};
+
+/* Per-socket state (accessed via vsk->trans) */
+struct virtio_vsock_sock {
+	struct vsock_sock *vsk;
+
+	/* Protected by lock_sock(sk_vsock(trans->vsk)) */
+	u32 buf_size;
+	u32 buf_size_min;
+	u32 buf_size_max;
+
+	struct mutex tx_lock;
+	struct mutex rx_lock;
+
+	/* Protected by tx_lock */
+	u32 tx_cnt;
+	u32 buf_alloc;
+	u32 peer_fwd_cnt;
+	u32 peer_buf_alloc;
+
+	/* Protected by rx_lock */
+	u32 fwd_cnt;
+	u32 rx_bytes;
+	struct list_head rx_queue;
+};
+
+struct virtio_vsock_pkt {
+	struct virtio_vsock_hdr	hdr;
+	struct work_struct work;
+	struct list_head list;
+	void *buf;
+	u32 len;
+	u32 off;
+};
+
+struct virtio_vsock_pkt_info {
+	u32 remote_cid, remote_port;
+	struct msghdr *msg;
+	u32 pkt_len;
+	u16 type;
+	u16 op;
+	u32 flags;
+};
+
+struct virtio_transport {
+	/* This must be the first field */
+	struct vsock_transport transport;
+
+	/* Send packet for a specific socket */
+	int (*send_pkt)(struct vsock_sock *vsk,
+			struct virtio_vsock_pkt_info *info);
+
+	/* Send packet without a socket (e.g. RST).  Prefer send_pkt() over
+	 * send_pkt_no_sock() when a socket exists.
+	 */
+	int (*send_pkt_no_sock)(struct virtio_vsock_pkt *pkt);
+};
+
+struct virtio_vsock_pkt *
+virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
+			   size_t len,
+			   u32 src_cid,
+			   u32 src_port,
+			   u32 dst_cid,
+			   u32 dst_port);
+ssize_t
+virtio_transport_stream_dequeue(struct vsock_sock *vsk,
+				struct msghdr *msg,
+				size_t len,
+				int type);
+int
+virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
+			       struct msghdr *msg,
+			       size_t len, int flags);
+
+s64 virtio_transport_stream_has_data(struct vsock_sock *vsk);
+s64 virtio_transport_stream_has_space(struct vsock_sock *vsk);
+
+int virtio_transport_do_socket_init(struct vsock_sock *vsk,
+				 struct vsock_sock *psk);
+u64 virtio_transport_get_buffer_size(struct vsock_sock *vsk);
+u64 virtio_transport_get_min_buffer_size(struct vsock_sock *vsk);
+u64 virtio_transport_get_max_buffer_size(struct vsock_sock *vsk);
+void virtio_transport_set_buffer_size(struct vsock_sock *vsk, u64 val);
+void virtio_transport_set_min_buffer_size(struct vsock_sock *vsk, u64 val);
+void virtio_transport_set_max_buffer_size(struct vsock_sock *vs, u64 val);
+int
+virtio_transport_notify_poll_in(struct vsock_sock *vsk,
+				size_t target,
+				bool *data_ready_now);
+int
+virtio_transport_notify_poll_out(struct vsock_sock *vsk,
+				 size_t target,
+				 bool *space_available_now);
+
+int virtio_transport_notify_recv_init(struct vsock_sock *vsk,
+	size_t target, struct vsock_transport_recv_notify_data *data);
+int virtio_transport_notify_recv_pre_block(struct vsock_sock *vsk,
+	size_t target, struct vsock_transport_recv_notify_data *data);
+int virtio_transport_notify_recv_pre_dequeue(struct vsock_sock *vsk,
+	size_t target, struct vsock_transport_recv_notify_data *data);
+int virtio_transport_notify_recv_post_dequeue(struct vsock_sock *vsk,
+	size_t target, ssize_t copied, bool data_read,
+	struct vsock_transport_recv_notify_data *data);
+int virtio_transport_notify_send_init(struct vsock_sock *vsk,
+	struct vsock_transport_send_notify_data *data);
+int virtio_transport_notify_send_pre_block(struct vsock_sock *vsk,
+	struct vsock_transport_send_notify_data *data);
+int virtio_transport_notify_send_pre_enqueue(struct vsock_sock *vsk,
+	struct vsock_transport_send_notify_data *data);
+int virtio_transport_notify_send_post_enqueue(struct vsock_sock *vsk,
+	ssize_t written, struct vsock_transport_send_notify_data *data);
+
+u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
+bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
+bool virtio_transport_stream_allow(u32 cid, u32 port);
+int virtio_transport_dgram_bind(struct vsock_sock *vsk,
+				struct sockaddr_vm *addr);
+bool virtio_transport_dgram_allow(u32 cid, u32 port);
+
+int virtio_transport_connect(struct vsock_sock *vsk);
+
+int virtio_transport_shutdown(struct vsock_sock *vsk, int mode);
+
+void virtio_transport_release(struct vsock_sock *vsk);
+
+ssize_t
+virtio_transport_stream_enqueue(struct vsock_sock *vsk,
+				struct msghdr *msg,
+				size_t len);
+int
+virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
+			       struct sockaddr_vm *remote_addr,
+			       struct msghdr *msg,
+			       size_t len);
+
+void virtio_transport_destruct(struct vsock_sock *vsk);
+
+void virtio_transport_recv_pkt(struct virtio_vsock_pkt *pkt);
+void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt);
+void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt);
+u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 wanted);
+void virtio_transport_put_credit(struct virtio_vsock_sock *vvs, u32 credit);
+
+#endif /* _LINUX_VIRTIO_VSOCK_H */
diff --git a/include/trace/events/vsock_virtio_transport_common.h b/include/trace/events/vsock_virtio_transport_common.h
new file mode 100644
index 0000000..b7f1d62
--- /dev/null
+++ b/include/trace/events/vsock_virtio_transport_common.h
@@ -0,0 +1,144 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM vsock
+
+#if !defined(_TRACE_VSOCK_VIRTIO_TRANSPORT_COMMON_H) || \
+    defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_VSOCK_VIRTIO_TRANSPORT_COMMON_H
+
+#include <linux/tracepoint.h>
+
+TRACE_DEFINE_ENUM(VIRTIO_VSOCK_TYPE_STREAM);
+
+#define show_type(val) \
+	__print_symbolic(val, { VIRTIO_VSOCK_TYPE_STREAM, "STREAM" })
+
+TRACE_DEFINE_ENUM(VIRTIO_VSOCK_OP_INVALID);
+TRACE_DEFINE_ENUM(VIRTIO_VSOCK_OP_REQUEST);
+TRACE_DEFINE_ENUM(VIRTIO_VSOCK_OP_RESPONSE);
+TRACE_DEFINE_ENUM(VIRTIO_VSOCK_OP_RST);
+TRACE_DEFINE_ENUM(VIRTIO_VSOCK_OP_SHUTDOWN);
+TRACE_DEFINE_ENUM(VIRTIO_VSOCK_OP_RW);
+TRACE_DEFINE_ENUM(VIRTIO_VSOCK_OP_CREDIT_UPDATE);
+TRACE_DEFINE_ENUM(VIRTIO_VSOCK_OP_CREDIT_REQUEST);
+
+#define show_op(val) \
+	__print_symbolic(val, \
+			 { VIRTIO_VSOCK_OP_INVALID, "INVALID" }, \
+			 { VIRTIO_VSOCK_OP_REQUEST, "REQUEST" }, \
+			 { VIRTIO_VSOCK_OP_RESPONSE, "RESPONSE" }, \
+			 { VIRTIO_VSOCK_OP_RST, "RST" }, \
+			 { VIRTIO_VSOCK_OP_SHUTDOWN, "SHUTDOWN" }, \
+			 { VIRTIO_VSOCK_OP_RW, "RW" }, \
+			 { VIRTIO_VSOCK_OP_CREDIT_UPDATE, "CREDIT_UPDATE" }, \
+			 { VIRTIO_VSOCK_OP_CREDIT_REQUEST, "CREDIT_REQUEST" })
+
+TRACE_EVENT(virtio_transport_alloc_pkt,
+	TP_PROTO(
+		 __u32 src_cid, __u32 src_port,
+		 __u32 dst_cid, __u32 dst_port,
+		 __u32 len,
+		 __u16 type,
+		 __u16 op,
+		 __u32 flags
+	),
+	TP_ARGS(
+		src_cid, src_port,
+		dst_cid, dst_port,
+		len,
+		type,
+		op,
+		flags
+	),
+	TP_STRUCT__entry(
+		__field(__u32, src_cid)
+		__field(__u32, src_port)
+		__field(__u32, dst_cid)
+		__field(__u32, dst_port)
+		__field(__u32, len)
+		__field(__u16, type)
+		__field(__u16, op)
+		__field(__u32, flags)
+	),
+	TP_fast_assign(
+		__entry->src_cid = src_cid;
+		__entry->src_port = src_port;
+		__entry->dst_cid = dst_cid;
+		__entry->dst_port = dst_port;
+		__entry->len = len;
+		__entry->type = type;
+		__entry->op = op;
+		__entry->flags = flags;
+	),
+	TP_printk("%u:%u -> %u:%u len=%u type=%s op=%s flags=%#x",
+		  __entry->src_cid, __entry->src_port,
+		  __entry->dst_cid, __entry->dst_port,
+		  __entry->len,
+		  show_type(__entry->type),
+		  show_op(__entry->op),
+		  __entry->flags)
+);
+
+TRACE_EVENT(virtio_transport_recv_pkt,
+	TP_PROTO(
+		 __u32 src_cid, __u32 src_port,
+		 __u32 dst_cid, __u32 dst_port,
+		 __u32 len,
+		 __u16 type,
+		 __u16 op,
+		 __u32 flags,
+		 __u32 buf_alloc,
+		 __u32 fwd_cnt
+	),
+	TP_ARGS(
+		src_cid, src_port,
+		dst_cid, dst_port,
+		len,
+		type,
+		op,
+		flags,
+		buf_alloc,
+		fwd_cnt
+	),
+	TP_STRUCT__entry(
+		__field(__u32, src_cid)
+		__field(__u32, src_port)
+		__field(__u32, dst_cid)
+		__field(__u32, dst_port)
+		__field(__u32, len)
+		__field(__u16, type)
+		__field(__u16, op)
+		__field(__u32, flags)
+		__field(__u32, buf_alloc)
+		__field(__u32, fwd_cnt)
+	),
+	TP_fast_assign(
+		__entry->src_cid = src_cid;
+		__entry->src_port = src_port;
+		__entry->dst_cid = dst_cid;
+		__entry->dst_port = dst_port;
+		__entry->len = len;
+		__entry->type = type;
+		__entry->op = op;
+		__entry->flags = flags;
+		__entry->buf_alloc = buf_alloc;
+		__entry->fwd_cnt = fwd_cnt;
+	),
+	TP_printk("%u:%u -> %u:%u len=%u type=%s op=%s flags=%#x "
+		  "buf_alloc=%u fwd_cnt=%u",
+		  __entry->src_cid, __entry->src_port,
+		  __entry->dst_cid, __entry->dst_port,
+		  __entry->len,
+		  show_type(__entry->type),
+		  show_op(__entry->op),
+		  __entry->flags,
+		  __entry->buf_alloc,
+		  __entry->fwd_cnt)
+);
+
+#endif /* _TRACE_VSOCK_VIRTIO_TRANSPORT_COMMON_H */
+
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE vsock_virtio_transport_common
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
index 77925f5..16dcf5d 100644
--- a/include/uapi/linux/virtio_ids.h
+++ b/include/uapi/linux/virtio_ids.h
@@ -39,6 +39,7 @@
 #define VIRTIO_ID_9P		9 /* 9p virtio console */
 #define VIRTIO_ID_RPROC_SERIAL 11 /* virtio remoteproc serial link */
 #define VIRTIO_ID_CAIF	       12 /* Virtio caif */
+#define VIRTIO_ID_VSOCK        13 /* virtio vsock transport */
 #define VIRTIO_ID_GPU          16 /* virtio GPU */
 #define VIRTIO_ID_INPUT        18 /* virtio input */
 
diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
new file mode 100644
index 0000000..ac6483d
--- /dev/null
+++ b/include/uapi/linux/virtio_vsock.h
@@ -0,0 +1,87 @@
+/*
+ * This header, excluding the #ifdef __KERNEL__ part, is BSD licensed so
+ * anyone can use the definitions to implement compatible drivers/servers:
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of IBM nor the names of its contributors
+ *    may be used to endorse or promote products derived from this software
+ *    without specific prior written permission.
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS IS''
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * Copyright (C) Red Hat, Inc., 2013-2015
+ * Copyright (C) Asias He <asias@redhat.com>, 2013
+ * Copyright (C) Stefan Hajnoczi <stefanha@redhat.com>, 2015
+ */
+
+#ifndef _UAPI_LINUX_VIRTIO_VSOCK_H
+#define _UAPI_LINUX_VIRTIO_VOSCK_H
+
+#include <linux/types.h>
+#include <linux/virtio_ids.h>
+#include <linux/virtio_config.h>
+
+struct virtio_vsock_config {
+	__le32 guest_cid;
+	__le32 max_virtqueue_pairs;
+};
+
+struct virtio_vsock_hdr {
+	__le32	src_cid;
+	__le32	src_port;
+	__le32	dst_cid;
+	__le32	dst_port;
+	__le32	len;
+	__le16	type;		/* enum virtio_vsock_type */
+	__le16	op;		/* enum virtio_vsock_op */
+	__le32	flags;
+	__le32	buf_alloc;
+	__le32	fwd_cnt;
+};
+
+enum virtio_vsock_type {
+	VIRTIO_VSOCK_TYPE_STREAM = 1,
+};
+
+enum virtio_vsock_op {
+	VIRTIO_VSOCK_OP_INVALID = 0,
+
+	/* Connect operations */
+	VIRTIO_VSOCK_OP_REQUEST = 1,
+	VIRTIO_VSOCK_OP_RESPONSE = 2,
+	VIRTIO_VSOCK_OP_RST = 3,
+	VIRTIO_VSOCK_OP_SHUTDOWN = 4,
+
+	/* To send payload */
+	VIRTIO_VSOCK_OP_RW = 5,
+
+	/* Tell the peer our credit info */
+	VIRTIO_VSOCK_OP_CREDIT_UPDATE = 6,
+	/* Request the peer to send the credit info to us */
+	VIRTIO_VSOCK_OP_CREDIT_REQUEST = 7,
+};
+
+/* VIRTIO_VSOCK_OP_SHUTDOWN flags values */
+enum virtio_vsock_shutdown {
+	VIRTIO_VSOCK_SHUTDOWN_RCV = 1,
+	VIRTIO_VSOCK_SHUTDOWN_SEND = 2,
+};
+
+#endif /* _UAPI_LINUX_VIRTIO_VSOCK_H */
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
new file mode 100644
index 0000000..000f12b
--- /dev/null
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -0,0 +1,834 @@
+/*
+ * common code for virtio vsock
+ *
+ * Copyright (C) 2013-2015 Red Hat, Inc.
+ * Author: Asias He <asias@redhat.com>
+ *         Stefan Hajnoczi <stefanha@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+#include <linux/module.h>
+#include <linux/ctype.h>
+#include <linux/list.h>
+#include <linux/virtio.h>
+#include <linux/virtio_ids.h>
+#include <linux/virtio_config.h>
+#include <linux/virtio_vsock.h>
+
+#include <net/sock.h>
+#include <net/af_vsock.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/vsock_virtio_transport_common.h>
+
+static const struct virtio_transport *virtio_transport_get_ops(void)
+{
+	const struct vsock_transport *t = vsock_core_get_transport();
+
+	return container_of(t, struct virtio_transport, transport);
+}
+
+static int virtio_transport_send_pkt(struct vsock_sock *vsk,
+				     struct virtio_vsock_pkt_info *info)
+{
+	return virtio_transport_get_ops()->send_pkt(vsk, info);
+}
+
+static int virtio_transport_send_pkt_no_sock(struct virtio_vsock_pkt *pkt)
+{
+	return virtio_transport_get_ops()->send_pkt_no_sock(pkt);
+}
+
+struct virtio_vsock_pkt *
+virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
+			   size_t len,
+			   u32 src_cid,
+			   u32 src_port,
+			   u32 dst_cid,
+			   u32 dst_port)
+{
+	struct virtio_vsock_pkt *pkt;
+	int err;
+
+	pkt = kzalloc(sizeof(*pkt), GFP_KERNEL);
+	if (!pkt)
+		return NULL;
+
+	pkt->hdr.type		= cpu_to_le16(info->type);
+	pkt->hdr.op		= cpu_to_le16(info->op);
+	pkt->hdr.src_cid	= cpu_to_le32(src_cid);
+	pkt->hdr.src_port	= cpu_to_le32(src_port);
+	pkt->hdr.dst_cid	= cpu_to_le32(dst_cid);
+	pkt->hdr.dst_port	= cpu_to_le32(dst_port);
+	pkt->hdr.flags		= cpu_to_le32(info->flags);
+	pkt->len		= len;
+	pkt->hdr.len		= cpu_to_le32(len);
+
+	if (info->msg && len > 0) {
+		pkt->buf = kmalloc(len, GFP_KERNEL);
+		if (!pkt->buf)
+			goto out_pkt;
+		err = memcpy_from_msg(pkt->buf, info->msg, len);
+		if (err)
+			goto out;
+	}
+
+	trace_virtio_transport_alloc_pkt(src_cid, src_port,
+					 dst_cid, dst_port,
+					 len,
+					 info->type,
+					 info->op,
+					 info->flags);
+
+	return pkt;
+
+out:
+	kfree(pkt->buf);
+out_pkt:
+	kfree(pkt);
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_alloc_pkt);
+
+static void virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
+					struct virtio_vsock_pkt *pkt)
+{
+	vvs->rx_bytes += pkt->len;
+}
+
+static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
+					struct virtio_vsock_pkt *pkt)
+{
+	vvs->rx_bytes -= pkt->len;
+	vvs->fwd_cnt += pkt->len;
+}
+
+void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
+{
+	mutex_lock(&vvs->tx_lock);
+	pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
+	pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
+	mutex_unlock(&vvs->tx_lock);
+}
+EXPORT_SYMBOL_GPL(virtio_transport_inc_tx_pkt);
+
+u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 credit)
+{
+	u32 ret;
+
+	mutex_lock(&vvs->tx_lock);
+	ret = vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt);
+	if (ret > credit)
+		ret = credit;
+	vvs->tx_cnt += ret;
+	mutex_unlock(&vvs->tx_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_get_credit);
+
+void virtio_transport_put_credit(struct virtio_vsock_sock *vvs, u32 credit)
+{
+	mutex_lock(&vvs->tx_lock);
+	vvs->tx_cnt -= credit;
+	mutex_unlock(&vvs->tx_lock);
+}
+EXPORT_SYMBOL_GPL(virtio_transport_put_credit);
+
+static int virtio_transport_send_credit_update(struct vsock_sock *vsk,
+					       int type,
+					       struct virtio_vsock_hdr *hdr)
+{
+	struct virtio_vsock_pkt_info info = {
+		.op = VIRTIO_VSOCK_OP_CREDIT_UPDATE,
+		.type = type,
+	};
+
+	return virtio_transport_send_pkt(vsk, &info);
+}
+
+static ssize_t
+virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
+				   struct msghdr *msg,
+				   size_t len)
+{
+	struct virtio_vsock_sock *vvs = vsk->trans;
+	struct virtio_vsock_pkt *pkt;
+	size_t bytes, total = 0;
+	int err = -EFAULT;
+
+	mutex_lock(&vvs->rx_lock);
+	while (total < len &&
+	       vvs->rx_bytes > 0 &&
+	       !list_empty(&vvs->rx_queue)) {
+		pkt = list_first_entry(&vvs->rx_queue,
+				       struct virtio_vsock_pkt, list);
+
+		bytes = len - total;
+		if (bytes > pkt->len - pkt->off)
+			bytes = pkt->len - pkt->off;
+
+		err = memcpy_to_msg(msg, pkt->buf + pkt->off, bytes);
+		if (err)
+			goto out;
+		total += bytes;
+		pkt->off += bytes;
+		if (pkt->off == pkt->len) {
+			virtio_transport_dec_rx_pkt(vvs, pkt);
+			list_del(&pkt->list);
+			virtio_transport_free_pkt(pkt);
+		}
+	}
+	mutex_unlock(&vvs->rx_lock);
+
+	/* Send a credit pkt to peer */
+	virtio_transport_send_credit_update(vsk, VIRTIO_VSOCK_TYPE_STREAM,
+					    NULL);
+
+	return total;
+
+out:
+	mutex_unlock(&vvs->rx_lock);
+	if (total)
+		err = total;
+	return err;
+}
+
+ssize_t
+virtio_transport_stream_dequeue(struct vsock_sock *vsk,
+				struct msghdr *msg,
+				size_t len, int flags)
+{
+	if (flags & MSG_PEEK)
+		return -EOPNOTSUPP;
+
+	return virtio_transport_stream_do_dequeue(vsk, msg, len);
+}
+EXPORT_SYMBOL_GPL(virtio_transport_stream_dequeue);
+
+int
+virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
+			       struct msghdr *msg,
+			       size_t len, int flags)
+{
+	return -EOPNOTSUPP;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue);
+
+s64 virtio_transport_stream_has_data(struct vsock_sock *vsk)
+{
+	struct virtio_vsock_sock *vvs = vsk->trans;
+	s64 bytes;
+
+	mutex_lock(&vvs->rx_lock);
+	bytes = vvs->rx_bytes;
+	mutex_unlock(&vvs->rx_lock);
+
+	return bytes;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_stream_has_data);
+
+static s64 virtio_transport_has_space(struct vsock_sock *vsk)
+{
+	struct virtio_vsock_sock *vvs = vsk->trans;
+	s64 bytes;
+
+	bytes = vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt);
+	if (bytes < 0)
+		bytes = 0;
+
+	return bytes;
+}
+
+s64 virtio_transport_stream_has_space(struct vsock_sock *vsk)
+{
+	struct virtio_vsock_sock *vvs = vsk->trans;
+	s64 bytes;
+
+	mutex_lock(&vvs->tx_lock);
+	bytes = virtio_transport_has_space(vsk);
+	mutex_unlock(&vvs->tx_lock);
+
+	return bytes;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_stream_has_space);
+
+int virtio_transport_do_socket_init(struct vsock_sock *vsk,
+				    struct vsock_sock *psk)
+{
+	struct virtio_vsock_sock *vvs;
+
+	vvs = kzalloc(sizeof(*vvs), GFP_KERNEL);
+	if (!vvs)
+		return -ENOMEM;
+
+	vsk->trans = vvs;
+	vvs->vsk = vsk;
+	if (psk) {
+		struct virtio_vsock_sock *ptrans = psk->trans;
+
+		vvs->buf_size	= ptrans->buf_size;
+		vvs->buf_size_min = ptrans->buf_size_min;
+		vvs->buf_size_max = ptrans->buf_size_max;
+		vvs->peer_buf_alloc = ptrans->peer_buf_alloc;
+	} else {
+		vvs->buf_size = VIRTIO_VSOCK_DEFAULT_BUF_SIZE;
+		vvs->buf_size_min = VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE;
+		vvs->buf_size_max = VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE;
+	}
+
+	vvs->buf_alloc = vvs->buf_size;
+
+	mutex_init(&vvs->rx_lock);
+	mutex_init(&vvs->tx_lock);
+	INIT_LIST_HEAD(&vvs->rx_queue);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_do_socket_init);
+
+u64 virtio_transport_get_buffer_size(struct vsock_sock *vsk)
+{
+	struct virtio_vsock_sock *vvs = vsk->trans;
+
+	return vvs->buf_size;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_get_buffer_size);
+
+u64 virtio_transport_get_min_buffer_size(struct vsock_sock *vsk)
+{
+	struct virtio_vsock_sock *vvs = vsk->trans;
+
+	return vvs->buf_size_min;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_get_min_buffer_size);
+
+u64 virtio_transport_get_max_buffer_size(struct vsock_sock *vsk)
+{
+	struct virtio_vsock_sock *vvs = vsk->trans;
+
+	return vvs->buf_size_max;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_get_max_buffer_size);
+
+void virtio_transport_set_buffer_size(struct vsock_sock *vsk, u64 val)
+{
+	struct virtio_vsock_sock *vvs = vsk->trans;
+
+	if (val > VIRTIO_VSOCK_MAX_BUF_SIZE)
+		val = VIRTIO_VSOCK_MAX_BUF_SIZE;
+	if (val < vvs->buf_size_min)
+		vvs->buf_size_min = val;
+	if (val > vvs->buf_size_max)
+		vvs->buf_size_max = val;
+	vvs->buf_size = val;
+	vvs->buf_alloc = val;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_set_buffer_size);
+
+void virtio_transport_set_min_buffer_size(struct vsock_sock *vsk, u64 val)
+{
+	struct virtio_vsock_sock *vvs = vsk->trans;
+
+	if (val > VIRTIO_VSOCK_MAX_BUF_SIZE)
+		val = VIRTIO_VSOCK_MAX_BUF_SIZE;
+	if (val > vvs->buf_size)
+		vvs->buf_size = val;
+	vvs->buf_size_min = val;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_set_min_buffer_size);
+
+void virtio_transport_set_max_buffer_size(struct vsock_sock *vsk, u64 val)
+{
+	struct virtio_vsock_sock *vvs = vsk->trans;
+
+	if (val > VIRTIO_VSOCK_MAX_BUF_SIZE)
+		val = VIRTIO_VSOCK_MAX_BUF_SIZE;
+	if (val < vvs->buf_size)
+		vvs->buf_size = val;
+	vvs->buf_size_max = val;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_set_max_buffer_size);
+
+int
+virtio_transport_notify_poll_in(struct vsock_sock *vsk,
+				size_t target,
+				bool *data_ready_now)
+{
+	if (vsock_stream_has_data(vsk))
+		*data_ready_now = true;
+	else
+		*data_ready_now = false;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_notify_poll_in);
+
+int
+virtio_transport_notify_poll_out(struct vsock_sock *vsk,
+				 size_t target,
+				 bool *space_avail_now)
+{
+	s64 free_space;
+
+	free_space = vsock_stream_has_space(vsk);
+	if (free_space > 0)
+		*space_avail_now = true;
+	else if (free_space == 0)
+		*space_avail_now = false;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_notify_poll_out);
+
+int virtio_transport_notify_recv_init(struct vsock_sock *vsk,
+	size_t target, struct vsock_transport_recv_notify_data *data)
+{
+	return 0;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_notify_recv_init);
+
+int virtio_transport_notify_recv_pre_block(struct vsock_sock *vsk,
+	size_t target, struct vsock_transport_recv_notify_data *data)
+{
+	return 0;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_notify_recv_pre_block);
+
+int virtio_transport_notify_recv_pre_dequeue(struct vsock_sock *vsk,
+	size_t target, struct vsock_transport_recv_notify_data *data)
+{
+	return 0;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_notify_recv_pre_dequeue);
+
+int virtio_transport_notify_recv_post_dequeue(struct vsock_sock *vsk,
+	size_t target, ssize_t copied, bool data_read,
+	struct vsock_transport_recv_notify_data *data)
+{
+	return 0;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_notify_recv_post_dequeue);
+
+int virtio_transport_notify_send_init(struct vsock_sock *vsk,
+	struct vsock_transport_send_notify_data *data)
+{
+	return 0;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_notify_send_init);
+
+int virtio_transport_notify_send_pre_block(struct vsock_sock *vsk,
+	struct vsock_transport_send_notify_data *data)
+{
+	return 0;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_notify_send_pre_block);
+
+int virtio_transport_notify_send_pre_enqueue(struct vsock_sock *vsk,
+	struct vsock_transport_send_notify_data *data)
+{
+	return 0;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_notify_send_pre_enqueue);
+
+int virtio_transport_notify_send_post_enqueue(struct vsock_sock *vsk,
+	ssize_t written, struct vsock_transport_send_notify_data *data)
+{
+	return 0;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_notify_send_post_enqueue);
+
+u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk)
+{
+	struct virtio_vsock_sock *vvs = vsk->trans;
+
+	return vvs->buf_size;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_stream_rcvhiwat);
+
+bool virtio_transport_stream_is_active(struct vsock_sock *vsk)
+{
+	return true;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_stream_is_active);
+
+bool virtio_transport_stream_allow(u32 cid, u32 port)
+{
+	return true;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
+
+int virtio_transport_dgram_bind(struct vsock_sock *vsk,
+				struct sockaddr_vm *addr)
+{
+	return -EOPNOTSUPP;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
+
+bool virtio_transport_dgram_allow(u32 cid, u32 port)
+{
+	return false;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_dgram_allow);
+
+int virtio_transport_connect(struct vsock_sock *vsk)
+{
+	struct virtio_vsock_pkt_info info = {
+		.op = VIRTIO_VSOCK_OP_REQUEST,
+		.type = VIRTIO_VSOCK_TYPE_STREAM,
+	};
+
+	return virtio_transport_send_pkt(vsk, &info);
+}
+EXPORT_SYMBOL_GPL(virtio_transport_connect);
+
+int virtio_transport_shutdown(struct vsock_sock *vsk, int mode)
+{
+	struct virtio_vsock_pkt_info info = {
+		.op = VIRTIO_VSOCK_OP_SHUTDOWN,
+		.type = VIRTIO_VSOCK_TYPE_STREAM,
+		.flags = (mode & RCV_SHUTDOWN ?
+			  VIRTIO_VSOCK_SHUTDOWN_RCV : 0) |
+			 (mode & SEND_SHUTDOWN ?
+			  VIRTIO_VSOCK_SHUTDOWN_SEND : 0),
+	};
+
+	return virtio_transport_send_pkt(vsk, &info);
+}
+EXPORT_SYMBOL_GPL(virtio_transport_shutdown);
+
+void virtio_transport_release(struct vsock_sock *vsk)
+{
+	struct sock *sk = &vsk->sk;
+
+	/* Tell other side to terminate connection */
+	if (sk->sk_type == SOCK_STREAM &&
+	    vsk->peer_shutdown != SHUTDOWN_MASK &&
+	    sk->sk_state == SS_CONNECTED)
+		(void)virtio_transport_shutdown(vsk, SHUTDOWN_MASK);
+}
+EXPORT_SYMBOL_GPL(virtio_transport_release);
+
+int
+virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
+			       struct sockaddr_vm *remote_addr,
+			       struct msghdr *msg,
+			       size_t dgram_len)
+{
+	return -EOPNOTSUPP;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);
+
+ssize_t
+virtio_transport_stream_enqueue(struct vsock_sock *vsk,
+				struct msghdr *msg,
+				size_t len)
+{
+	struct virtio_vsock_pkt_info info = {
+		.op = VIRTIO_VSOCK_OP_RW,
+		.type = VIRTIO_VSOCK_TYPE_STREAM,
+		.msg = msg,
+		.pkt_len = len,
+	};
+
+	return virtio_transport_send_pkt(vsk, &info);
+}
+EXPORT_SYMBOL_GPL(virtio_transport_stream_enqueue);
+
+void virtio_transport_destruct(struct vsock_sock *vsk)
+{
+	struct virtio_vsock_sock *vvs = vsk->trans;
+
+	kfree(vvs);
+}
+EXPORT_SYMBOL_GPL(virtio_transport_destruct);
+
+static int virtio_transport_send_reset(struct vsock_sock *vsk,
+				       struct virtio_vsock_pkt *pkt)
+{
+	struct virtio_vsock_pkt_info info = {
+		.op = VIRTIO_VSOCK_OP_RST,
+		.type = VIRTIO_VSOCK_TYPE_STREAM,
+	};
+
+	/* Send RST only if the original pkt is not a RST pkt */
+	if (le16_to_cpu(pkt->hdr.op) == VIRTIO_VSOCK_OP_RST)
+		return 0;
+
+	return virtio_transport_send_pkt(vsk, &info);
+}
+
+/* Normally packets are associated with a socket.  There may be no socket if an
+ * attempt was made to connect to a socket that does not exist.
+ */
+static int virtio_transport_send_reset_no_sock(struct virtio_vsock_pkt *pkt)
+{
+	struct virtio_vsock_pkt_info info = {
+		.op = VIRTIO_VSOCK_OP_RST,
+		.type = le16_to_cpu(pkt->hdr.type),
+	};
+
+	/* Send RST only if the original pkt is not a RST pkt */
+	if (le16_to_cpu(pkt->hdr.op) == VIRTIO_VSOCK_OP_RST)
+		return 0;
+
+	pkt = virtio_transport_alloc_pkt(&info, 0,
+					 le32_to_cpu(pkt->hdr.dst_cid),
+					 le32_to_cpu(pkt->hdr.dst_port),
+					 le32_to_cpu(pkt->hdr.src_cid),
+					 le32_to_cpu(pkt->hdr.src_port));
+	if (!pkt)
+		return -ENOMEM;
+
+	return virtio_transport_send_pkt_no_sock(pkt);
+}
+
+static int
+virtio_transport_recv_connecting(struct sock *sk,
+				 struct virtio_vsock_pkt *pkt)
+{
+	struct vsock_sock *vsk = vsock_sk(sk);
+	int err;
+	int skerr;
+
+	switch (le16_to_cpu(pkt->hdr.op)) {
+	case VIRTIO_VSOCK_OP_RESPONSE:
+		sk->sk_state = SS_CONNECTED;
+		sk->sk_socket->state = SS_CONNECTED;
+		vsock_insert_connected(vsk);
+		sk->sk_state_change(sk);
+		break;
+	case VIRTIO_VSOCK_OP_INVALID:
+		break;
+	case VIRTIO_VSOCK_OP_RST:
+		skerr = ECONNRESET;
+		err = 0;
+		goto destroy;
+	default:
+		skerr = EPROTO;
+		err = -EINVAL;
+		goto destroy;
+	}
+	return 0;
+
+destroy:
+	virtio_transport_send_reset(vsk, pkt);
+	sk->sk_state = SS_UNCONNECTED;
+	sk->sk_err = skerr;
+	sk->sk_error_report(sk);
+	return err;
+}
+
+static int
+virtio_transport_recv_connected(struct sock *sk,
+				struct virtio_vsock_pkt *pkt)
+{
+	struct vsock_sock *vsk = vsock_sk(sk);
+	struct virtio_vsock_sock *vvs = vsk->trans;
+	int err = 0;
+
+	switch (le16_to_cpu(pkt->hdr.op)) {
+	case VIRTIO_VSOCK_OP_RW:
+		pkt->len = le32_to_cpu(pkt->hdr.len);
+		pkt->off = 0;
+
+		mutex_lock(&vvs->rx_lock);
+		virtio_transport_inc_rx_pkt(vvs, pkt);
+		list_add_tail(&pkt->list, &vvs->rx_queue);
+		mutex_unlock(&vvs->rx_lock);
+
+		sk->sk_data_ready(sk);
+		return err;
+	case VIRTIO_VSOCK_OP_CREDIT_UPDATE:
+		sk->sk_write_space(sk);
+		break;
+	case VIRTIO_VSOCK_OP_SHUTDOWN:
+		if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SHUTDOWN_RCV)
+			vsk->peer_shutdown |= RCV_SHUTDOWN;
+		if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SHUTDOWN_SEND)
+			vsk->peer_shutdown |= SEND_SHUTDOWN;
+		if (vsk->peer_shutdown == SHUTDOWN_MASK &&
+		    vsock_stream_has_data(vsk) <= 0)
+			sk->sk_state = SS_DISCONNECTING;
+		if (le32_to_cpu(pkt->hdr.flags))
+			sk->sk_state_change(sk);
+		break;
+	case VIRTIO_VSOCK_OP_RST:
+		sock_set_flag(sk, SOCK_DONE);
+		vsk->peer_shutdown = SHUTDOWN_MASK;
+		if (vsock_stream_has_data(vsk) <= 0)
+			sk->sk_state = SS_DISCONNECTING;
+		sk->sk_state_change(sk);
+		break;
+	default:
+		err = -EINVAL;
+		break;
+	}
+
+	virtio_transport_free_pkt(pkt);
+	return err;
+}
+
+static int
+virtio_transport_send_response(struct vsock_sock *vsk,
+			       struct virtio_vsock_pkt *pkt)
+{
+	struct virtio_vsock_pkt_info info = {
+		.op = VIRTIO_VSOCK_OP_RESPONSE,
+		.type = VIRTIO_VSOCK_TYPE_STREAM,
+		.remote_cid = le32_to_cpu(pkt->hdr.src_cid),
+		.remote_port = le32_to_cpu(pkt->hdr.src_port),
+	};
+
+	return virtio_transport_send_pkt(vsk, &info);
+}
+
+/* Handle server socket */
+static int
+virtio_transport_recv_listen(struct sock *sk, struct virtio_vsock_pkt *pkt)
+{
+	struct vsock_sock *vsk = vsock_sk(sk);
+	struct vsock_sock *vchild;
+	struct sock *child;
+
+	if (le16_to_cpu(pkt->hdr.op) != VIRTIO_VSOCK_OP_REQUEST) {
+		virtio_transport_send_reset(vsk, pkt);
+		return -EINVAL;
+	}
+
+	if (sk_acceptq_is_full(sk)) {
+		virtio_transport_send_reset(vsk, pkt);
+		return -ENOMEM;
+	}
+
+	child = __vsock_create(sock_net(sk), NULL, sk, GFP_KERNEL,
+			       sk->sk_type, 0);
+	if (!child) {
+		virtio_transport_send_reset(vsk, pkt);
+		return -ENOMEM;
+	}
+
+	sk->sk_ack_backlog++;
+
+	lock_sock(child);
+
+	child->sk_state = SS_CONNECTED;
+
+	vchild = vsock_sk(child);
+	vsock_addr_init(&vchild->local_addr, le32_to_cpu(pkt->hdr.dst_cid),
+			le32_to_cpu(pkt->hdr.dst_port));
+	vsock_addr_init(&vchild->remote_addr, le32_to_cpu(pkt->hdr.src_cid),
+			le32_to_cpu(pkt->hdr.src_port));
+
+	vsock_insert_connected(vchild);
+	vsock_enqueue_accept(sk, child);
+	virtio_transport_send_response(vchild, pkt);
+
+	release_sock(child);
+
+	sk->sk_data_ready(sk);
+	return 0;
+}
+
+static void virtio_transport_space_update(struct sock *sk,
+					  struct virtio_vsock_pkt *pkt)
+{
+	struct vsock_sock *vsk = vsock_sk(sk);
+	struct virtio_vsock_sock *vvs = vsk->trans;
+	bool space_available;
+
+	/* buf_alloc and fwd_cnt is always included in the hdr */
+	mutex_lock(&vvs->tx_lock);
+	vvs->peer_buf_alloc = le32_to_cpu(pkt->hdr.buf_alloc);
+	vvs->peer_fwd_cnt = le32_to_cpu(pkt->hdr.fwd_cnt);
+	space_available = virtio_transport_has_space(vsk);
+	mutex_unlock(&vvs->tx_lock);
+
+	if (space_available)
+		sk->sk_write_space(sk);
+}
+
+/* We are under the virtio-vsock's vsock->rx_lock or vhost-vsock's vq->mutex
+ * lock.
+ */
+void virtio_transport_recv_pkt(struct virtio_vsock_pkt *pkt)
+{
+	struct sockaddr_vm src, dst;
+	struct vsock_sock *vsk;
+	struct sock *sk;
+
+	vsock_addr_init(&src, le32_to_cpu(pkt->hdr.src_cid),
+			le32_to_cpu(pkt->hdr.src_port));
+	vsock_addr_init(&dst, le32_to_cpu(pkt->hdr.dst_cid),
+			le32_to_cpu(pkt->hdr.dst_port));
+
+	trace_virtio_transport_recv_pkt(src.svm_cid, src.svm_port,
+					dst.svm_cid, dst.svm_port,
+					le32_to_cpu(pkt->hdr.len),
+					le16_to_cpu(pkt->hdr.type),
+					le16_to_cpu(pkt->hdr.op),
+					le32_to_cpu(pkt->hdr.flags),
+					le32_to_cpu(pkt->hdr.buf_alloc),
+					le32_to_cpu(pkt->hdr.fwd_cnt));
+
+	if (le16_to_cpu(pkt->hdr.type) != VIRTIO_VSOCK_TYPE_STREAM) {
+		(void)virtio_transport_send_reset_no_sock(pkt);
+		goto free_pkt;
+	}
+
+	/* The socket must be in connected or bound table
+	 * otherwise send reset back
+	 */
+	sk = vsock_find_connected_socket(&src, &dst);
+	if (!sk) {
+		sk = vsock_find_bound_socket(&dst);
+		if (!sk) {
+			(void)virtio_transport_send_reset_no_sock(pkt);
+			goto free_pkt;
+		}
+	}
+
+	vsk = vsock_sk(sk);
+
+	virtio_transport_space_update(sk, pkt);
+
+	lock_sock(sk);
+	switch (sk->sk_state) {
+	case VSOCK_SS_LISTEN:
+		virtio_transport_recv_listen(sk, pkt);
+		virtio_transport_free_pkt(pkt);
+		break;
+	case SS_CONNECTING:
+		virtio_transport_recv_connecting(sk, pkt);
+		virtio_transport_free_pkt(pkt);
+		break;
+	case SS_CONNECTED:
+		virtio_transport_recv_connected(sk, pkt);
+		break;
+	default:
+		virtio_transport_free_pkt(pkt);
+		break;
+	}
+	release_sock(sk);
+
+	/* Release refcnt obtained when we fetched this socket out of the
+	 * bound or connected list.
+	 */
+	sock_put(sk);
+	return;
+
+free_pkt:
+	virtio_transport_free_pkt(pkt);
+}
+EXPORT_SYMBOL_GPL(virtio_transport_recv_pkt);
+
+void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt)
+{
+	kfree(pkt->buf);
+	kfree(pkt);
+}
+EXPORT_SYMBOL_GPL(virtio_transport_free_pkt);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Asias He");
+MODULE_DESCRIPTION("common code for virtio vsock");
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC v4 3/5] VSOCK: Introduce virtio_transport.ko
  2015-12-22  9:07 [RFC v4 0/5] Add virtio transport for AF_VSOCK Stefan Hajnoczi
  2015-12-22  9:07 ` [RFC v4 1/5] VSOCK: transport-specific vsock_transport functions Stefan Hajnoczi
  2015-12-22  9:07 ` [RFC v4 2/5] VSOCK: Introduce virtio_vsock_common.ko Stefan Hajnoczi
@ 2015-12-22  9:07 ` Stefan Hajnoczi
  2015-12-22  9:07 ` [RFC v4 4/5] VSOCK: Introduce vhost_vsock.ko Stefan Hajnoczi
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Stefan Hajnoczi @ 2015-12-22  9:07 UTC (permalink / raw)
  To: kvm
  Cc: Stefan Hajnoczi, Michael S. Tsirkin, netdev, virtualization,
	Matt Benjamin, Asias He, Christoffer Dall, matt.ma

From: Asias He <asias@redhat.com>

VM sockets virtio transport implementation.  This driver runs in the
guest.

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
v4:
 * Add MAINTAINERS file entry
 * Drop short/long rx packets
 * checkpatch.pl cleanups
 * Clarify locking in struct virtio_vsock
 * Narrow local variable scopes as suggested by Alex Bennee
 * Call wake_up() after decrementing total_tx_buf to avoid deadlock
v2:
 * Fix total_tx_buf accounting
 * Add virtio_transport global mutex to prevent races

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 MAINTAINERS                      |   1 +
 net/vmw_vsock/virtio_transport.c | 481 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 482 insertions(+)
 create mode 100644 net/vmw_vsock/virtio_transport.c

diff --git a/MAINTAINERS b/MAINTAINERS
index d42db78..67d8504 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11369,6 +11369,7 @@ S:	Maintained
 F:	include/linux/virtio_vsock.h
 F:	include/uapi/linux/virtio_vsock.h
 F:	net/vmw_vsock/virtio_transport_common.c
+F:	net/vmw_vsock/virtio_transport.c
 
 VIRTUAL SERIO DEVICE DRIVER
 M:	Stephen Chandler Paul <thatslyude@gmail.com>
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
new file mode 100644
index 0000000..e4787bf
--- /dev/null
+++ b/net/vmw_vsock/virtio_transport.c
@@ -0,0 +1,481 @@
+/*
+ * virtio transport for vsock
+ *
+ * Copyright (C) 2013-2015 Red Hat, Inc.
+ * Author: Asias He <asias@redhat.com>
+ *         Stefan Hajnoczi <stefanha@redhat.com>
+ *
+ * Some of the code is take from Gerd Hoffmann <kraxel@redhat.com>'s
+ * early virtio-vsock proof-of-concept bits.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+#include <linux/spinlock.h>
+#include <linux/module.h>
+#include <linux/list.h>
+#include <linux/virtio.h>
+#include <linux/virtio_ids.h>
+#include <linux/virtio_config.h>
+#include <linux/virtio_vsock.h>
+#include <net/sock.h>
+#include <linux/mutex.h>
+#include <net/af_vsock.h>
+
+static struct workqueue_struct *virtio_vsock_workqueue;
+static struct virtio_vsock *the_virtio_vsock;
+static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
+static void virtio_vsock_rx_fill(struct virtio_vsock *vsock);
+
+struct virtio_vsock {
+	struct virtio_device *vdev;
+	struct virtqueue *vqs[VSOCK_VQ_MAX];
+
+	/* Virtqueue processing is deferred to a workqueue */
+	struct work_struct tx_work;
+	struct work_struct rx_work;
+
+	wait_queue_head_t tx_wait;	/* for waiting for tx resources */
+
+	/* The following fields are protected by tx_lock.  vqs[VSOCK_VQ_TX]
+	 * must be accessed with tx_lock held.
+	 */
+	struct mutex tx_lock;
+	u32 total_tx_buf;
+
+	/* The following fields are protected by rx_lock.  vqs[VSOCK_VQ_RX]
+	 * must be accessed with rx_lock held.
+	 */
+	struct mutex rx_lock;
+	int rx_buf_nr;
+	int rx_buf_max_nr;
+
+	u32 guest_cid;
+};
+
+static struct virtio_vsock *virtio_vsock_get(void)
+{
+	return the_virtio_vsock;
+}
+
+static u32 virtio_transport_get_local_cid(void)
+{
+	struct virtio_vsock *vsock = virtio_vsock_get();
+
+	return vsock->guest_cid;
+}
+
+static int
+virtio_transport_send_one_pkt(struct virtio_vsock *vsock,
+			      struct virtio_vsock_pkt *pkt)
+{
+	struct scatterlist hdr, buf, *sgs[2];
+	int ret, in_sg = 0, out_sg = 0;
+	struct virtqueue *vq;
+	DEFINE_WAIT(wait);
+
+	vq = vsock->vqs[VSOCK_VQ_TX];
+
+	/* Put pkt in the virtqueue */
+	sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
+	sgs[out_sg++] = &hdr;
+	if (pkt->buf) {
+		sg_init_one(&buf, pkt->buf, pkt->len);
+		sgs[out_sg++] = &buf;
+	}
+
+	mutex_lock(&vsock->tx_lock);
+	while ((ret = virtqueue_add_sgs(vq, sgs, out_sg, in_sg, pkt,
+					GFP_KERNEL)) < 0) {
+		prepare_to_wait_exclusive(&vsock->tx_wait, &wait,
+					  TASK_UNINTERRUPTIBLE);
+		mutex_unlock(&vsock->tx_lock);
+		schedule();
+		mutex_lock(&vsock->tx_lock);
+		finish_wait(&vsock->tx_wait, &wait);
+	}
+	virtqueue_kick(vq);
+	mutex_unlock(&vsock->tx_lock);
+
+	return pkt->len;
+}
+
+static int
+virtio_transport_send_pkt_no_sock(struct virtio_vsock_pkt *pkt)
+{
+	struct virtio_vsock *vsock;
+
+	vsock = virtio_vsock_get();
+	if (!vsock) {
+		virtio_transport_free_pkt(pkt);
+		return -ENODEV;
+	}
+
+	return virtio_transport_send_one_pkt(vsock, pkt);
+}
+
+static int
+virtio_transport_send_pkt(struct vsock_sock *vsk,
+			  struct virtio_vsock_pkt_info *info)
+{
+	u32 src_cid, src_port, dst_cid, dst_port;
+	struct virtio_vsock_sock *vvs;
+	struct virtio_vsock_pkt *pkt;
+	struct virtio_vsock *vsock;
+	u32 pkt_len = info->pkt_len;
+	DEFINE_WAIT(wait);
+
+	vsock = virtio_vsock_get();
+	if (!vsock)
+		return -ENODEV;
+
+	src_cid	= virtio_transport_get_local_cid();
+	src_port = vsk->local_addr.svm_port;
+	if (!info->remote_cid) {
+		dst_cid	= vsk->remote_addr.svm_cid;
+		dst_port = vsk->remote_addr.svm_port;
+	} else {
+		dst_cid = info->remote_cid;
+		dst_port = info->remote_port;
+	}
+
+	vvs = vsk->trans;
+
+	if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
+		pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
+	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
+	/* Do not send zero length OP_RW pkt*/
+	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
+		return pkt_len;
+
+	/* Respect global tx buf limitation */
+	mutex_lock(&vsock->tx_lock);
+	while (pkt_len + vsock->total_tx_buf > VIRTIO_VSOCK_MAX_TX_BUF_SIZE) {
+		prepare_to_wait_exclusive(&vsock->tx_wait, &wait,
+					  TASK_UNINTERRUPTIBLE);
+		mutex_unlock(&vsock->tx_lock);
+		schedule();
+		mutex_lock(&vsock->tx_lock);
+		finish_wait(&vsock->tx_wait, &wait);
+	}
+	vsock->total_tx_buf += pkt_len;
+	mutex_unlock(&vsock->tx_lock);
+
+	pkt = virtio_transport_alloc_pkt(info, pkt_len,
+					 src_cid, src_port,
+					 dst_cid, dst_port);
+	if (!pkt) {
+		mutex_lock(&vsock->tx_lock);
+		vsock->total_tx_buf -= pkt_len;
+		mutex_unlock(&vsock->tx_lock);
+		virtio_transport_put_credit(vvs, pkt_len);
+		wake_up(&vsock->tx_wait);
+		return -ENOMEM;
+	}
+
+	virtio_transport_inc_tx_pkt(vvs, pkt);
+
+	return virtio_transport_send_one_pkt(vsock, pkt);
+}
+
+static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
+{
+	int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
+	struct virtio_vsock_pkt *pkt;
+	struct scatterlist hdr, buf, *sgs[2];
+	struct virtqueue *vq;
+	int ret;
+
+	vq = vsock->vqs[VSOCK_VQ_RX];
+
+	do {
+		pkt = kzalloc(sizeof(*pkt), GFP_KERNEL);
+		if (!pkt)
+			break;
+
+		pkt->buf = kmalloc(buf_len, GFP_KERNEL);
+		if (!pkt->buf) {
+			virtio_transport_free_pkt(pkt);
+			break;
+		}
+
+		pkt->len = buf_len;
+
+		sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
+		sgs[0] = &hdr;
+
+		sg_init_one(&buf, pkt->buf, buf_len);
+		sgs[1] = &buf;
+		ret = virtqueue_add_sgs(vq, sgs, 0, 2, pkt, GFP_KERNEL);
+		if (ret) {
+			virtio_transport_free_pkt(pkt);
+			break;
+		}
+		vsock->rx_buf_nr++;
+	} while (vq->num_free);
+	if (vsock->rx_buf_nr > vsock->rx_buf_max_nr)
+		vsock->rx_buf_max_nr = vsock->rx_buf_nr;
+	virtqueue_kick(vq);
+}
+
+static void virtio_transport_send_pkt_work(struct work_struct *work)
+{
+	struct virtio_vsock *vsock =
+		container_of(work, struct virtio_vsock, tx_work);
+	struct virtqueue *vq;
+	bool added = false;
+
+	vq = vsock->vqs[VSOCK_VQ_TX];
+	mutex_lock(&vsock->tx_lock);
+	do {
+		struct virtio_vsock_pkt *pkt;
+		unsigned int len;
+
+		virtqueue_disable_cb(vq);
+		while ((pkt = virtqueue_get_buf(vq, &len)) != NULL) {
+			vsock->total_tx_buf -= pkt->len;
+			virtio_transport_free_pkt(pkt);
+			added = true;
+		}
+	} while (!virtqueue_enable_cb(vq));
+	mutex_unlock(&vsock->tx_lock);
+
+	if (added)
+		wake_up(&vsock->tx_wait);
+}
+
+static void virtio_transport_recv_pkt_work(struct work_struct *work)
+{
+	struct virtio_vsock *vsock =
+		container_of(work, struct virtio_vsock, rx_work);
+	struct virtqueue *vq;
+
+	vq = vsock->vqs[VSOCK_VQ_RX];
+	mutex_lock(&vsock->rx_lock);
+	do {
+		struct virtio_vsock_pkt *pkt;
+		unsigned int len;
+
+		virtqueue_disable_cb(vq);
+		while ((pkt = virtqueue_get_buf(vq, &len)) != NULL) {
+			vsock->rx_buf_nr--;
+
+			/* Drop short/long packets */
+			if (unlikely(len < sizeof(pkt->hdr) ||
+				     len > sizeof(pkt->hdr) + pkt->len)) {
+				virtio_transport_free_pkt(pkt);
+				continue;
+			}
+
+			pkt->len = len - sizeof(pkt->hdr);
+			virtio_transport_recv_pkt(pkt);
+		}
+	} while (!virtqueue_enable_cb(vq));
+
+	if (vsock->rx_buf_nr < vsock->rx_buf_max_nr / 2)
+		virtio_vsock_rx_fill(vsock);
+	mutex_unlock(&vsock->rx_lock);
+}
+
+static void virtio_vsock_ctrl_done(struct virtqueue *vq)
+{
+}
+
+static void virtio_vsock_tx_done(struct virtqueue *vq)
+{
+	struct virtio_vsock *vsock = vq->vdev->priv;
+
+	if (!vsock)
+		return;
+	queue_work(virtio_vsock_workqueue, &vsock->tx_work);
+}
+
+static void virtio_vsock_rx_done(struct virtqueue *vq)
+{
+	struct virtio_vsock *vsock = vq->vdev->priv;
+
+	if (!vsock)
+		return;
+	queue_work(virtio_vsock_workqueue, &vsock->rx_work);
+}
+
+static struct virtio_transport virtio_transport = {
+	.transport = {
+		.get_local_cid            = virtio_transport_get_local_cid,
+
+		.init                     = virtio_transport_do_socket_init,
+		.destruct                 = virtio_transport_destruct,
+		.release                  = virtio_transport_release,
+		.connect                  = virtio_transport_connect,
+		.shutdown                 = virtio_transport_shutdown,
+
+		.dgram_bind               = virtio_transport_dgram_bind,
+		.dgram_dequeue            = virtio_transport_dgram_dequeue,
+		.dgram_enqueue            = virtio_transport_dgram_enqueue,
+		.dgram_allow              = virtio_transport_dgram_allow,
+
+		.stream_dequeue           = virtio_transport_stream_dequeue,
+		.stream_enqueue           = virtio_transport_stream_enqueue,
+		.stream_has_data          = virtio_transport_stream_has_data,
+		.stream_has_space         = virtio_transport_stream_has_space,
+		.stream_rcvhiwat          = virtio_transport_stream_rcvhiwat,
+		.stream_is_active         = virtio_transport_stream_is_active,
+		.stream_allow             = virtio_transport_stream_allow,
+
+		.notify_poll_in           = virtio_transport_notify_poll_in,
+		.notify_poll_out          = virtio_transport_notify_poll_out,
+		.notify_recv_init         = virtio_transport_notify_recv_init,
+		.notify_recv_pre_block    = virtio_transport_notify_recv_pre_block,
+		.notify_recv_pre_dequeue  = virtio_transport_notify_recv_pre_dequeue,
+		.notify_recv_post_dequeue = virtio_transport_notify_recv_post_dequeue,
+		.notify_send_init         = virtio_transport_notify_send_init,
+		.notify_send_pre_block    = virtio_transport_notify_send_pre_block,
+		.notify_send_pre_enqueue  = virtio_transport_notify_send_pre_enqueue,
+		.notify_send_post_enqueue = virtio_transport_notify_send_post_enqueue,
+
+		.set_buffer_size          = virtio_transport_set_buffer_size,
+		.set_min_buffer_size      = virtio_transport_set_min_buffer_size,
+		.set_max_buffer_size      = virtio_transport_set_max_buffer_size,
+		.get_buffer_size          = virtio_transport_get_buffer_size,
+		.get_min_buffer_size      = virtio_transport_get_min_buffer_size,
+		.get_max_buffer_size      = virtio_transport_get_max_buffer_size,
+	},
+
+	.send_pkt		= virtio_transport_send_pkt,
+	.send_pkt_no_sock	= virtio_transport_send_pkt_no_sock,
+};
+
+static int virtio_vsock_probe(struct virtio_device *vdev)
+{
+	vq_callback_t *callbacks[] = {
+		virtio_vsock_ctrl_done,
+		virtio_vsock_rx_done,
+		virtio_vsock_tx_done,
+	};
+	static const char * const names[] = {
+		"ctrl",
+		"rx",
+		"tx",
+	};
+	struct virtio_vsock *vsock = NULL;
+	u32 guest_cid;
+	int ret;
+
+	ret = mutex_lock_interruptible(&the_virtio_vsock_mutex);
+	if (ret)
+		return ret;
+
+	/* Only one virtio-vsock device per guest is supported */
+	if (the_virtio_vsock) {
+		ret = -EBUSY;
+		goto out;
+	}
+
+	vsock = kzalloc(sizeof(*vsock), GFP_KERNEL);
+	if (!vsock) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	vsock->vdev = vdev;
+
+	ret = vsock->vdev->config->find_vqs(vsock->vdev, VSOCK_VQ_MAX,
+					    vsock->vqs, callbacks, names);
+	if (ret < 0)
+		goto out;
+
+	vdev->config->get(vdev, offsetof(struct virtio_vsock_config, guest_cid),
+			  &guest_cid, sizeof(guest_cid));
+	vsock->guest_cid = le32_to_cpu(guest_cid);
+
+	ret = vsock_core_init(&virtio_transport.transport);
+	if (ret < 0)
+		goto out_vqs;
+
+	vsock->rx_buf_nr = 0;
+	vsock->rx_buf_max_nr = 0;
+
+	vdev->priv = vsock;
+	the_virtio_vsock = vsock;
+	init_waitqueue_head(&vsock->tx_wait);
+	mutex_init(&vsock->tx_lock);
+	mutex_init(&vsock->rx_lock);
+	INIT_WORK(&vsock->rx_work, virtio_transport_recv_pkt_work);
+	INIT_WORK(&vsock->tx_work, virtio_transport_send_pkt_work);
+
+	mutex_lock(&vsock->rx_lock);
+	virtio_vsock_rx_fill(vsock);
+	mutex_unlock(&vsock->rx_lock);
+
+	mutex_unlock(&the_virtio_vsock_mutex);
+	return 0;
+
+out_vqs:
+	vsock->vdev->config->del_vqs(vsock->vdev);
+out:
+	kfree(vsock);
+	mutex_unlock(&the_virtio_vsock_mutex);
+	return ret;
+}
+
+static void virtio_vsock_remove(struct virtio_device *vdev)
+{
+	struct virtio_vsock *vsock = vdev->priv;
+
+	flush_work(&vsock->rx_work);
+	flush_work(&vsock->tx_work);
+
+	vdev->config->reset(vdev);
+
+	mutex_lock(&the_virtio_vsock_mutex);
+	the_virtio_vsock = NULL;
+	vsock_core_exit();
+	mutex_unlock(&the_virtio_vsock_mutex);
+
+	vdev->config->del_vqs(vdev);
+
+	kfree(vsock);
+}
+
+static struct virtio_device_id id_table[] = {
+	{ VIRTIO_ID_VSOCK, VIRTIO_DEV_ANY_ID },
+	{ 0 },
+};
+
+static unsigned int features[] = {
+};
+
+static struct virtio_driver virtio_vsock_driver = {
+	.feature_table = features,
+	.feature_table_size = ARRAY_SIZE(features),
+	.driver.name = KBUILD_MODNAME,
+	.driver.owner = THIS_MODULE,
+	.id_table = id_table,
+	.probe = virtio_vsock_probe,
+	.remove = virtio_vsock_remove,
+};
+
+static int __init virtio_vsock_init(void)
+{
+	int ret;
+
+	virtio_vsock_workqueue = alloc_workqueue("virtio_vsock", 0, 0);
+	if (!virtio_vsock_workqueue)
+		return -ENOMEM;
+	ret = register_virtio_driver(&virtio_vsock_driver);
+	if (ret)
+		destroy_workqueue(virtio_vsock_workqueue);
+	return ret;
+}
+
+static void __exit virtio_vsock_exit(void)
+{
+	unregister_virtio_driver(&virtio_vsock_driver);
+	destroy_workqueue(virtio_vsock_workqueue);
+}
+
+module_init(virtio_vsock_init);
+module_exit(virtio_vsock_exit);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Asias He");
+MODULE_DESCRIPTION("virtio transport for vsock");
+MODULE_DEVICE_TABLE(virtio, id_table);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC v4 4/5] VSOCK: Introduce vhost_vsock.ko
  2015-12-22  9:07 [RFC v4 0/5] Add virtio transport for AF_VSOCK Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2015-12-22  9:07 ` [RFC v4 3/5] VSOCK: Introduce virtio_transport.ko Stefan Hajnoczi
@ 2015-12-22  9:07 ` Stefan Hajnoczi
  2015-12-22  9:07 ` [RFC v4 5/5] VSOCK: Add Makefile and Kconfig Stefan Hajnoczi
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Stefan Hajnoczi @ 2015-12-22  9:07 UTC (permalink / raw)
  To: kvm
  Cc: Stefan Hajnoczi, Michael S. Tsirkin, netdev, virtualization,
	Matt Benjamin, Asias He, Christoffer Dall, matt.ma

From: Asias He <asias@redhat.com>

VM sockets vhost transport implementation.  This driver runs on the
host.

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
v4:
 * Add MAINTAINERS file entry
 * virtqueue used len is now sizeof(pkt->hdr) + pkt->len instead of just
   pkt->len
 * checkpatch.pl cleanups
 * Clarify struct vhost_vsock locking
 * Add comments about optimization that disables virtqueue notify
 * Drop unused vhost_vsock_handle_ctl_kick()
 * Call wake_up() after decrementing total_tx_buf to prevent deadlock
v3:
 * Remove unneeded variable used to store return value
   (Fengguang Wu <fengguang.wu@intel.com> and Julia Lawall
   <julia.lawall@lip6.fr>)
v2:
 * Add missing total_tx_buf decrement
 * Support flexible rx/tx descriptor layout
 * Refuse to assign reserved CIDs
 * Refuse guest CID if already in use
 * Only accept correctly addressed packets
vhost: checkpatch.pl cleanups

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 MAINTAINERS           |   2 +
 drivers/vhost/vsock.c | 607 ++++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/vhost/vsock.h |   4 +
 3 files changed, 613 insertions(+)
 create mode 100644 drivers/vhost/vsock.c
 create mode 100644 drivers/vhost/vsock.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 67d8504..0181dc2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11370,6 +11370,8 @@ F:	include/linux/virtio_vsock.h
 F:	include/uapi/linux/virtio_vsock.h
 F:	net/vmw_vsock/virtio_transport_common.c
 F:	net/vmw_vsock/virtio_transport.c
+F:	drivers/vhost/vsock.c
+F:	drivers/vhost/vsock.h
 
 VIRTUAL SERIO DEVICE DRIVER
 M:	Stephen Chandler Paul <thatslyude@gmail.com>
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
new file mode 100644
index 0000000..2c5963c
--- /dev/null
+++ b/drivers/vhost/vsock.c
@@ -0,0 +1,607 @@
+/*
+ * vhost transport for vsock
+ *
+ * Copyright (C) 2013-2015 Red Hat, Inc.
+ * Author: Asias He <asias@redhat.com>
+ *         Stefan Hajnoczi <stefanha@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+#include <linux/miscdevice.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <net/sock.h>
+#include <linux/virtio_vsock.h>
+#include <linux/vhost.h>
+
+#include <net/af_vsock.h>
+#include "vhost.h"
+#include "vsock.h"
+
+#define VHOST_VSOCK_DEFAULT_HOST_CID	2
+
+enum {
+	VHOST_VSOCK_FEATURES = VHOST_FEATURES,
+};
+
+/* Used to track all the vhost_vsock instances on the system. */
+static LIST_HEAD(vhost_vsock_list);
+static DEFINE_MUTEX(vhost_vsock_mutex);
+
+struct vhost_vsock {
+	struct vhost_dev dev;
+	struct vhost_virtqueue vqs[VSOCK_VQ_MAX];
+
+	/* Link to global vhost_vsock_list, protected by vhost_vsock_mutex */
+	struct list_head list;
+
+	struct vhost_work send_pkt_work;
+	wait_queue_head_t send_wait;
+
+	/* Fields protected by vqs[VSOCK_VQ_RX].mutex */
+	struct list_head send_pkt_list;	/* host->guest pending packets */
+	u32 total_tx_buf;
+
+	u32 guest_cid;
+};
+
+static u32 vhost_transport_get_local_cid(void)
+{
+	return VHOST_VSOCK_DEFAULT_HOST_CID;
+}
+
+static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
+{
+	struct vhost_vsock *vsock;
+
+	mutex_lock(&vhost_vsock_mutex);
+	list_for_each_entry(vsock, &vhost_vsock_list, list) {
+		if (vsock->guest_cid == guest_cid) {
+			mutex_unlock(&vhost_vsock_mutex);
+			return vsock;
+		}
+	}
+	mutex_unlock(&vhost_vsock_mutex);
+
+	return NULL;
+}
+
+static void
+vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
+			    struct vhost_virtqueue *vq)
+{
+	bool added = false;
+
+	mutex_lock(&vq->mutex);
+
+	/* Avoid further vmexits, we're already processing the virtqueue */
+	vhost_disable_notify(&vsock->dev, vq);
+
+	for (;;) {
+		struct virtio_vsock_pkt *pkt;
+		struct iov_iter iov_iter;
+		unsigned out, in;
+		size_t nbytes;
+		size_t len;
+		int head;
+
+		if (list_empty(&vsock->send_pkt_list)) {
+			vhost_enable_notify(&vsock->dev, vq);
+			break;
+		}
+
+		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
+					 &out, &in, NULL, NULL);
+		if (head < 0)
+			break;
+
+		if (head == vq->num) {
+			/* We cannot finish yet if more buffers snuck in while
+			 * re-enabling notify.
+			 */
+			if (unlikely(vhost_enable_notify(&vsock->dev, vq))) {
+				vhost_disable_notify(&vsock->dev, vq);
+				continue;
+			}
+			break;
+		}
+
+		pkt = list_first_entry(&vsock->send_pkt_list,
+				       struct virtio_vsock_pkt, list);
+		list_del_init(&pkt->list);
+
+		if (out) {
+			virtio_transport_free_pkt(pkt);
+			vq_err(vq, "Expected 0 output buffers, got %u\n", out);
+			break;
+		}
+
+		len = iov_length(&vq->iov[out], in);
+		iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
+
+		nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
+		if (nbytes != sizeof(pkt->hdr)) {
+			virtio_transport_free_pkt(pkt);
+			vq_err(vq, "Faulted on copying pkt hdr\n");
+			break;
+		}
+
+		nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);
+		if (nbytes != pkt->len) {
+			virtio_transport_free_pkt(pkt);
+			vq_err(vq, "Faulted on copying pkt buf\n");
+			break;
+		}
+
+		vhost_add_used(vq, head, sizeof(pkt->hdr) + pkt->len);
+		added = true;
+
+		vsock->total_tx_buf -= pkt->len;
+
+		virtio_transport_free_pkt(pkt);
+	}
+	if (added)
+		vhost_signal(&vsock->dev, vq);
+	mutex_unlock(&vq->mutex);
+
+	if (added)
+		wake_up(&vsock->send_wait);
+}
+
+static void vhost_transport_send_pkt_work(struct vhost_work *work)
+{
+	struct vhost_virtqueue *vq;
+	struct vhost_vsock *vsock;
+
+	vsock = container_of(work, struct vhost_vsock, send_pkt_work);
+	vq = &vsock->vqs[VSOCK_VQ_RX];
+
+	vhost_transport_do_send_pkt(vsock, vq);
+}
+
+static int
+vhost_transport_send_one_pkt(struct vhost_vsock *vsock,
+			     struct virtio_vsock_pkt *pkt)
+{
+	struct vhost_virtqueue *vq = &vsock->vqs[VSOCK_VQ_RX];
+
+	/* Queue it up in vhost work */
+	mutex_lock(&vq->mutex);
+	list_add_tail(&pkt->list, &vsock->send_pkt_list);
+	vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
+	mutex_unlock(&vq->mutex);
+
+	return pkt->len;
+}
+
+static int
+vhost_transport_send_pkt_no_sock(struct virtio_vsock_pkt *pkt)
+{
+	struct vhost_vsock *vsock;
+
+	/* Find the vhost_vsock according to guest context id  */
+	vsock = vhost_vsock_get(le32_to_cpu(pkt->hdr.dst_cid));
+	if (!vsock) {
+		virtio_transport_free_pkt(pkt);
+		return -ENODEV;
+	}
+
+	return vhost_transport_send_one_pkt(vsock, pkt);
+}
+
+static int
+vhost_transport_send_pkt(struct vsock_sock *vsk,
+			 struct virtio_vsock_pkt_info *info)
+{
+	u32 src_cid, src_port, dst_cid, dst_port;
+	struct virtio_vsock_sock *vvs;
+	struct virtio_vsock_pkt *pkt;
+	struct vhost_virtqueue *vq;
+	struct vhost_vsock *vsock;
+	u32 pkt_len = info->pkt_len;
+	DEFINE_WAIT(wait);
+
+	src_cid = vhost_transport_get_local_cid();
+	src_port = vsk->local_addr.svm_port;
+	if (!info->remote_cid) {
+		dst_cid	= vsk->remote_addr.svm_cid;
+		dst_port = vsk->remote_addr.svm_port;
+	} else {
+		dst_cid = info->remote_cid;
+		dst_port = info->remote_port;
+	}
+
+	/* Find the vhost_vsock according to guest context id  */
+	vsock = vhost_vsock_get(dst_cid);
+	if (!vsock)
+		return -ENODEV;
+
+	vvs = vsk->trans;
+	vq = &vsock->vqs[VSOCK_VQ_RX];
+
+	/* we can send less than pkt_len bytes */
+	if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
+		pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
+
+	/* virtio_transport_get_credit might return less than pkt_len credit */
+	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
+
+	/* Do not send zero length OP_RW pkt*/
+	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
+		return pkt_len;
+
+	/* Respect global tx buf limitation */
+	mutex_lock(&vq->mutex);
+	while (pkt_len + vsock->total_tx_buf > VIRTIO_VSOCK_MAX_TX_BUF_SIZE) {
+		prepare_to_wait_exclusive(&vsock->send_wait, &wait,
+					  TASK_UNINTERRUPTIBLE);
+		mutex_unlock(&vq->mutex);
+		schedule();
+		mutex_lock(&vq->mutex);
+		finish_wait(&vsock->send_wait, &wait);
+	}
+	vsock->total_tx_buf += pkt_len;
+	mutex_unlock(&vq->mutex);
+
+	pkt = virtio_transport_alloc_pkt(info, pkt_len,
+					 src_cid, src_port,
+					 dst_cid, dst_port);
+	if (!pkt) {
+		mutex_lock(&vq->mutex);
+		vsock->total_tx_buf -= pkt_len;
+		mutex_unlock(&vq->mutex);
+		virtio_transport_put_credit(vvs, pkt_len);
+		wake_up(&vsock->send_wait);
+		return -ENOMEM;
+	}
+
+	virtio_transport_inc_tx_pkt(vvs, pkt);
+
+	return vhost_transport_send_one_pkt(vsock, pkt);
+}
+
+static struct virtio_vsock_pkt *
+vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
+		      unsigned int out, unsigned int in)
+{
+	struct virtio_vsock_pkt *pkt;
+	struct iov_iter iov_iter;
+	size_t nbytes;
+	size_t len;
+
+	if (in != 0) {
+		vq_err(vq, "Expected 0 input buffers, got %u\n", in);
+		return NULL;
+	}
+
+	pkt = kzalloc(sizeof(*pkt), GFP_KERNEL);
+	if (!pkt)
+		return NULL;
+
+	len = iov_length(vq->iov, out);
+	iov_iter_init(&iov_iter, WRITE, vq->iov, out, len);
+
+	nbytes = copy_from_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
+	if (nbytes != sizeof(pkt->hdr)) {
+		vq_err(vq, "Expected %zu bytes for pkt->hdr, got %zu bytes\n",
+		       sizeof(pkt->hdr), nbytes);
+		kfree(pkt);
+		return NULL;
+	}
+
+	if (le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_STREAM)
+		pkt->len = le32_to_cpu(pkt->hdr.len);
+
+	/* No payload */
+	if (!pkt->len)
+		return pkt;
+
+	/* The pkt is too big */
+	if (pkt->len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
+		kfree(pkt);
+		return NULL;
+	}
+
+	pkt->buf = kmalloc(pkt->len, GFP_KERNEL);
+	if (!pkt->buf) {
+		kfree(pkt);
+		return NULL;
+	}
+
+	nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter);
+	if (nbytes != pkt->len) {
+		vq_err(vq, "Expected %u byte payload, got %zu bytes\n",
+		       pkt->len, nbytes);
+		virtio_transport_free_pkt(pkt);
+		return NULL;
+	}
+
+	return pkt;
+}
+
+static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
+{
+	struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,
+						  poll.work);
+	struct vhost_vsock *vsock = container_of(vq->dev, struct vhost_vsock,
+						 dev);
+	struct virtio_vsock_pkt *pkt;
+	int head;
+	unsigned int out, in;
+	bool added = false;
+
+	mutex_lock(&vq->mutex);
+	vhost_disable_notify(&vsock->dev, vq);
+	for (;;) {
+		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
+					 &out, &in, NULL, NULL);
+		if (head < 0)
+			break;
+
+		if (head == vq->num) {
+			if (unlikely(vhost_enable_notify(&vsock->dev, vq))) {
+				vhost_disable_notify(&vsock->dev, vq);
+				continue;
+			}
+			break;
+		}
+
+		pkt = vhost_vsock_alloc_pkt(vq, out, in);
+		if (!pkt) {
+			vq_err(vq, "Faulted on pkt\n");
+			continue;
+		}
+
+		/* Only accept correctly addressed packets */
+		if (le32_to_cpu(pkt->hdr.src_cid) == vsock->guest_cid)
+			virtio_transport_recv_pkt(pkt);
+		else
+			virtio_transport_free_pkt(pkt);
+
+		vhost_add_used(vq, head, sizeof(pkt->hdr) + pkt->len);
+		added = true;
+	}
+	if (added)
+		vhost_signal(&vsock->dev, vq);
+	mutex_unlock(&vq->mutex);
+}
+
+static void vhost_vsock_handle_rx_kick(struct vhost_work *work)
+{
+	struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,
+						poll.work);
+	struct vhost_vsock *vsock = container_of(vq->dev, struct vhost_vsock,
+						 dev);
+
+	vhost_transport_do_send_pkt(vsock, vq);
+}
+
+static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
+{
+	struct vhost_virtqueue **vqs;
+	struct vhost_vsock *vsock;
+	int ret;
+
+	vsock = kzalloc(sizeof(*vsock), GFP_KERNEL);
+	if (!vsock)
+		return -ENOMEM;
+
+	vqs = kmalloc_array(VSOCK_VQ_MAX, sizeof(*vqs), GFP_KERNEL);
+	if (!vqs) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	vqs[VSOCK_VQ_CTRL] = &vsock->vqs[VSOCK_VQ_CTRL];
+	vqs[VSOCK_VQ_TX] = &vsock->vqs[VSOCK_VQ_TX];
+	vqs[VSOCK_VQ_RX] = &vsock->vqs[VSOCK_VQ_RX];
+	/* The CTRL virtqueue is currently not used so skip it. */
+	vsock->vqs[VSOCK_VQ_TX].handle_kick = vhost_vsock_handle_tx_kick;
+	vsock->vqs[VSOCK_VQ_RX].handle_kick = vhost_vsock_handle_rx_kick;
+
+	vhost_dev_init(&vsock->dev, vqs, VSOCK_VQ_MAX);
+
+	file->private_data = vsock;
+	init_waitqueue_head(&vsock->send_wait);
+	INIT_LIST_HEAD(&vsock->send_pkt_list);
+	vhost_work_init(&vsock->send_pkt_work, vhost_transport_send_pkt_work);
+
+	mutex_lock(&vhost_vsock_mutex);
+	list_add_tail(&vsock->list, &vhost_vsock_list);
+	mutex_unlock(&vhost_vsock_mutex);
+	return 0;
+
+out:
+	kfree(vsock);
+	return ret;
+}
+
+static void vhost_vsock_flush(struct vhost_vsock *vsock)
+{
+	int i;
+
+	for (i = 0; i < VSOCK_VQ_MAX; i++)
+		if (vsock->vqs[i].handle_kick)
+			vhost_poll_flush(&vsock->vqs[i].poll);
+	vhost_work_flush(&vsock->dev, &vsock->send_pkt_work);
+}
+
+static int vhost_vsock_dev_release(struct inode *inode, struct file *file)
+{
+	struct vhost_vsock *vsock = file->private_data;
+
+	mutex_lock(&vhost_vsock_mutex);
+	list_del(&vsock->list);
+	mutex_unlock(&vhost_vsock_mutex);
+
+	vhost_dev_stop(&vsock->dev);
+	vhost_vsock_flush(vsock);
+	vhost_dev_cleanup(&vsock->dev, false);
+	kfree(vsock->dev.vqs);
+	kfree(vsock);
+	return 0;
+}
+
+static int vhost_vsock_set_cid(struct vhost_vsock *vsock, u32 guest_cid)
+{
+	struct vhost_vsock *other;
+
+	/* Refuse reserved CIDs */
+	if (guest_cid <= VMADDR_CID_HOST)
+		return -EINVAL;
+
+	/* Refuse if CID is already in use */
+	other = vhost_vsock_get(guest_cid);
+	if (other && other != vsock)
+		return -EADDRINUSE;
+
+	mutex_lock(&vhost_vsock_mutex);
+	vsock->guest_cid = guest_cid;
+	mutex_unlock(&vhost_vsock_mutex);
+
+	return 0;
+}
+
+static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
+{
+	struct vhost_virtqueue *vq;
+	int i;
+
+	if (features & ~VHOST_VSOCK_FEATURES)
+		return -EOPNOTSUPP;
+
+	mutex_lock(&vsock->dev.mutex);
+	if ((features & (1 << VHOST_F_LOG_ALL)) &&
+	    !vhost_log_access_ok(&vsock->dev)) {
+		mutex_unlock(&vsock->dev.mutex);
+		return -EFAULT;
+	}
+
+	for (i = 0; i < VSOCK_VQ_MAX; i++) {
+		vq = &vsock->vqs[i];
+		mutex_lock(&vq->mutex);
+		vq->acked_features = features;
+		mutex_unlock(&vq->mutex);
+	}
+	mutex_unlock(&vsock->dev.mutex);
+	return 0;
+}
+
+static long vhost_vsock_dev_ioctl(struct file *f, unsigned int ioctl,
+				  unsigned long arg)
+{
+	struct vhost_vsock *vsock = f->private_data;
+	void __user *argp = (void __user *)arg;
+	u64 __user *featurep = argp;
+	u32 __user *cidp = argp;
+	u32 guest_cid;
+	u64 features;
+	int r;
+
+	switch (ioctl) {
+	case VHOST_VSOCK_SET_GUEST_CID:
+		if (get_user(guest_cid, cidp))
+			return -EFAULT;
+		return vhost_vsock_set_cid(vsock, guest_cid);
+	case VHOST_GET_FEATURES:
+		features = VHOST_VSOCK_FEATURES;
+		if (copy_to_user(featurep, &features, sizeof(features)))
+			return -EFAULT;
+		return 0;
+	case VHOST_SET_FEATURES:
+		if (copy_from_user(&features, featurep, sizeof(features)))
+			return -EFAULT;
+		return vhost_vsock_set_features(vsock, features);
+	default:
+		mutex_lock(&vsock->dev.mutex);
+		r = vhost_dev_ioctl(&vsock->dev, ioctl, argp);
+		if (r == -ENOIOCTLCMD)
+			r = vhost_vring_ioctl(&vsock->dev, ioctl, argp);
+		else
+			vhost_vsock_flush(vsock);
+		mutex_unlock(&vsock->dev.mutex);
+		return r;
+	}
+}
+
+static const struct file_operations vhost_vsock_fops = {
+	.owner          = THIS_MODULE,
+	.open           = vhost_vsock_dev_open,
+	.release        = vhost_vsock_dev_release,
+	.llseek		= noop_llseek,
+	.unlocked_ioctl = vhost_vsock_dev_ioctl,
+};
+
+static struct miscdevice vhost_vsock_misc = {
+	.minor = MISC_DYNAMIC_MINOR,
+	.name = "vhost-vsock",
+	.fops = &vhost_vsock_fops,
+};
+
+static struct virtio_transport vhost_transport = {
+	.transport = {
+		.get_local_cid            = vhost_transport_get_local_cid,
+
+		.init                     = virtio_transport_do_socket_init,
+		.destruct                 = virtio_transport_destruct,
+		.release                  = virtio_transport_release,
+		.connect                  = virtio_transport_connect,
+		.shutdown                 = virtio_transport_shutdown,
+
+		.dgram_enqueue            = virtio_transport_dgram_enqueue,
+		.dgram_dequeue            = virtio_transport_dgram_dequeue,
+		.dgram_bind               = virtio_transport_dgram_bind,
+		.dgram_allow              = virtio_transport_dgram_allow,
+
+		.stream_enqueue           = virtio_transport_stream_enqueue,
+		.stream_dequeue           = virtio_transport_stream_dequeue,
+		.stream_has_data          = virtio_transport_stream_has_data,
+		.stream_has_space         = virtio_transport_stream_has_space,
+		.stream_rcvhiwat          = virtio_transport_stream_rcvhiwat,
+		.stream_is_active         = virtio_transport_stream_is_active,
+		.stream_allow             = virtio_transport_stream_allow,
+
+		.notify_poll_in           = virtio_transport_notify_poll_in,
+		.notify_poll_out          = virtio_transport_notify_poll_out,
+		.notify_recv_init         = virtio_transport_notify_recv_init,
+		.notify_recv_pre_block    = virtio_transport_notify_recv_pre_block,
+		.notify_recv_pre_dequeue  = virtio_transport_notify_recv_pre_dequeue,
+		.notify_recv_post_dequeue = virtio_transport_notify_recv_post_dequeue,
+		.notify_send_init         = virtio_transport_notify_send_init,
+		.notify_send_pre_block    = virtio_transport_notify_send_pre_block,
+		.notify_send_pre_enqueue  = virtio_transport_notify_send_pre_enqueue,
+		.notify_send_post_enqueue = virtio_transport_notify_send_post_enqueue,
+
+		.set_buffer_size          = virtio_transport_set_buffer_size,
+		.set_min_buffer_size      = virtio_transport_set_min_buffer_size,
+		.set_max_buffer_size      = virtio_transport_set_max_buffer_size,
+		.get_buffer_size          = virtio_transport_get_buffer_size,
+		.get_min_buffer_size      = virtio_transport_get_min_buffer_size,
+		.get_max_buffer_size      = virtio_transport_get_max_buffer_size,
+	},
+
+	.send_pkt = vhost_transport_send_pkt,
+	.send_pkt_no_sock = vhost_transport_send_pkt_no_sock,
+};
+
+static int __init vhost_vsock_init(void)
+{
+	int ret;
+
+	ret = vsock_core_init(&vhost_transport.transport);
+	if (ret < 0)
+		return ret;
+	return misc_register(&vhost_vsock_misc);
+};
+
+static void __exit vhost_vsock_exit(void)
+{
+	misc_deregister(&vhost_vsock_misc);
+	vsock_core_exit();
+};
+
+module_init(vhost_vsock_init);
+module_exit(vhost_vsock_exit);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Asias He");
+MODULE_DESCRIPTION("vhost transport for vsock ");
diff --git a/drivers/vhost/vsock.h b/drivers/vhost/vsock.h
new file mode 100644
index 0000000..0ddb107
--- /dev/null
+++ b/drivers/vhost/vsock.h
@@ -0,0 +1,4 @@
+#ifndef VHOST_VSOCK_H
+#define VHOST_VSOCK_H
+#define VHOST_VSOCK_SET_GUEST_CID _IOW(VHOST_VIRTIO, 0x60, __u32)
+#endif
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC v4 5/5] VSOCK: Add Makefile and Kconfig
  2015-12-22  9:07 [RFC v4 0/5] Add virtio transport for AF_VSOCK Stefan Hajnoczi
                   ` (3 preceding siblings ...)
  2015-12-22  9:07 ` [RFC v4 4/5] VSOCK: Introduce vhost_vsock.ko Stefan Hajnoczi
@ 2015-12-22  9:07 ` Stefan Hajnoczi
  2016-01-04  7:55 ` [RFC v4 0/5] Add virtio transport for AF_VSOCK Jason Wang
  2016-01-28 15:19 ` Stefan Hajnoczi
  6 siblings, 0 replies; 9+ messages in thread
From: Stefan Hajnoczi @ 2015-12-22  9:07 UTC (permalink / raw)
  To: kvm
  Cc: Stefan Hajnoczi, Michael S. Tsirkin, netdev, virtualization,
	Matt Benjamin, Asias He, Christoffer Dall, matt.ma

From: Asias He <asias@redhat.com>

Enable virtio-vsock and vhost-vsock.

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
v4:
 * Make checkpatch.pl happy with longer option description
 * Clarify dependency on virtio rather than QEMU as suggested by Alex
   Bennee
v3:
 * Don't put vhost vsock driver into staging
 * Add missing Kconfig dependencies (Arnd Bergmann <arnd@arndb.de>)
---
 drivers/vhost/Kconfig  | 15 +++++++++++++++
 drivers/vhost/Makefile |  4 ++++
 net/vmw_vsock/Kconfig  | 19 +++++++++++++++++++
 net/vmw_vsock/Makefile |  2 ++
 4 files changed, 40 insertions(+)

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 533eaf0..d7aae9e 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -21,6 +21,21 @@ config VHOST_SCSI
 	Say M here to enable the vhost_scsi TCM fabric module
 	for use with virtio-scsi guests
 
+config VHOST_VSOCK
+	tristate "vhost virtio-vsock driver"
+	depends on VSOCKETS && EVENTFD
+	select VIRTIO_VSOCKETS_COMMON
+	select VHOST
+	select VHOST_RING
+	default n
+	---help---
+	This kernel module can be loaded in the host kernel to provide AF_VSOCK
+	sockets for communicating with guests.  The guests must have the
+	virtio_transport.ko driver loaded to use the virtio-vsock device.
+
+	To compile this driver as a module, choose M here: the module will be called
+	vhost_vsock.
+
 config VHOST_RING
 	tristate
 	---help---
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index e0441c3..6b012b9 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -4,5 +4,9 @@ vhost_net-y := net.o
 obj-$(CONFIG_VHOST_SCSI) += vhost_scsi.o
 vhost_scsi-y := scsi.o
 
+obj-$(CONFIG_VHOST_VSOCK) += vhost_vsock.o
+vhost_vsock-y := vsock.o
+
 obj-$(CONFIG_VHOST_RING) += vringh.o
+
 obj-$(CONFIG_VHOST)	+= vhost.o
diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig
index 14810ab..f27e74b 100644
--- a/net/vmw_vsock/Kconfig
+++ b/net/vmw_vsock/Kconfig
@@ -26,3 +26,22 @@ config VMWARE_VMCI_VSOCKETS
 
 	  To compile this driver as a module, choose M here: the module
 	  will be called vmw_vsock_vmci_transport. If unsure, say N.
+
+config VIRTIO_VSOCKETS
+	tristate "virtio transport for Virtual Sockets"
+	depends on VSOCKETS && VIRTIO
+	select VIRTIO_VSOCKETS_COMMON
+	help
+	  This module implements a virtio transport for Virtual Sockets.
+
+	  Enable this transport if your Virtual Machine host supports Virtual
+	  Sockets over virtio.
+
+	  To compile this driver as a module, choose M here: the module
+	  will be called virtio_vsock_transport. If unsure, say N.
+
+config VIRTIO_VSOCKETS_COMMON
+       tristate
+       ---help---
+         This option is selected by any driver which needs to access
+         the virtio_vsock.
diff --git a/net/vmw_vsock/Makefile b/net/vmw_vsock/Makefile
index 2ce52d7..cf4c294 100644
--- a/net/vmw_vsock/Makefile
+++ b/net/vmw_vsock/Makefile
@@ -1,5 +1,7 @@
 obj-$(CONFIG_VSOCKETS) += vsock.o
 obj-$(CONFIG_VMWARE_VMCI_VSOCKETS) += vmw_vsock_vmci_transport.o
+obj-$(CONFIG_VIRTIO_VSOCKETS) += virtio_transport.o
+obj-$(CONFIG_VIRTIO_VSOCKETS_COMMON) += virtio_transport_common.o
 
 vsock-y += af_vsock.o vsock_addr.o
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC v4 0/5] Add virtio transport for AF_VSOCK
  2015-12-22  9:07 [RFC v4 0/5] Add virtio transport for AF_VSOCK Stefan Hajnoczi
                   ` (4 preceding siblings ...)
  2015-12-22  9:07 ` [RFC v4 5/5] VSOCK: Add Makefile and Kconfig Stefan Hajnoczi
@ 2016-01-04  7:55 ` Jason Wang
  2016-01-07  8:34   ` Stefan Hajnoczi
  2016-01-28 15:19 ` Stefan Hajnoczi
  6 siblings, 1 reply; 9+ messages in thread
From: Jason Wang @ 2016-01-04  7:55 UTC (permalink / raw)
  To: Stefan Hajnoczi, kvm
  Cc: Michael S. Tsirkin, netdev, virtualization, Matt Benjamin,
	Christoffer Dall, matt.ma



On 12/22/2015 05:07 PM, Stefan Hajnoczi wrote:
> This series is based on v4.4-rc2 and the "virtio: make find_vqs()
> checkpatch.pl-friendly" patch I recently submitted.
>
> v4:
>  * Addressed code review comments from Alex Bennee
>  * MAINTAINERS file entries for new files
>  * Trace events instead of pr_debug()
>  * RST packet is sent when there is no listen socket
>  * Allow guest->host connections again (began discussing netfilter support with
>    Matt Benjamin instead of hard-coding security policy in virtio-vsock code)
>  * Many checkpatch.pl cleanups (will be 100% clean in v5)
>
> v3:
>  * Remove unnecessary 3-way handshake, just do REQUEST/RESPONSE instead
>    of REQUEST/RESPONSE/ACK
>  * Remove SOCK_DGRAM support and focus on SOCK_STREAM first
>    (also drop v2 Patch 1, it's only needed for SOCK_DGRAM)
>  * Only allow host->guest connections (same security model as latest
>    VMware)
>  * Don't put vhost vsock driver into staging
>  * Add missing Kconfig dependencies (Arnd Bergmann <arnd@arndb.de>)
>  * Remove unneeded variable used to store return value
>    (Fengguang Wu <fengguang.wu@intel.com> and Julia Lawall
>    <julia.lawall@lip6.fr>)
>
> v2:
>  * Rebased onto Linux v4.4-rc2
>  * vhost: Refuse to assign reserved CIDs
>  * vhost: Refuse guest CID if already in use
>  * vhost: Only accept correctly addressed packets (no spoofing!)
>  * vhost: Support flexible rx/tx descriptor layout
>  * vhost: Add missing total_tx_buf decrement
>  * virtio_transport: Fix total_tx_buf accounting
>  * virtio_transport: Add virtio_transport global mutex to prevent races
>  * common: Notify other side of SOCK_STREAM disconnect (fixes shutdown
>    semantics)
>  * common: Avoid recursive mutex_lock(tx_lock) for write_space (fixes deadlock)
>  * common: Define VIRTIO_VSOCK_TYPE_STREAM/DGRAM hardware interface constants
>  * common: Define VIRTIO_VSOCK_SHUTDOWN_RCV/SEND hardware interface constants
>  * common: Fix peer_buf_alloc inheritance on child socket
>
> This patch series adds a virtio transport for AF_VSOCK (net/vmw_vsock/).
> AF_VSOCK is designed for communication between virtual machines and
> hypervisors.  It is currently only implemented for VMware's VMCI transport.
>
> This series implements the proposed virtio-vsock device specification from
> here:
> http://permalink.gmane.org/gmane.comp.emulators.virtio.devel/980
>
> Most of the work was done by Asias He and Gerd Hoffmann a while back.  I have
> picked up the series again.
>
> The QEMU userspace changes are here:
> https://github.com/stefanha/qemu/commits/vsock
>
> Why virtio-vsock?
> -----------------
> Guest<->host communication is currently done over the virtio-serial device.
> This makes it hard to port sockets API-based applications and is limited to
> static ports.
>
> virtio-vsock uses the sockets API so that applications can rely on familiar
> SOCK_STREAM semantics.  Applications on the host can easily connect to guest
> agents because the sockets API allows multiple connections to a listen socket
> (unlike virtio-serial).  This simplifies the guest<->host communication and
> eliminates the need for extra processes on the host to arbitrate virtio-serial
> ports.
>
> Overview
> --------
> This series adds 3 pieces:
>
> 1. virtio_transport_common.ko - core virtio vsock code that uses vsock.ko
>
> 2. virtio_transport.ko - guest driver
>
> 3. drivers/vhost/vsock.ko - host driver

Have a (dumb maybe) question after a quick glance at the codes:

Is there any chance to reuse existed virtio-net/vhost-net codes? For
example, using virito-net instead of a new device as a transport in
guest and using vhost-net (especially consider it uses a socket as
backend) in host. Maybe just a new virtio-net header type for vsock. I'm
asking since I don't see any blocker for doing this.

Thanks

> Howto
> -----
> The following kernel options are needed:
>   CONFIG_VSOCKETS=y
>   CONFIG_VIRTIO_VSOCKETS=y
>   CONFIG_VIRTIO_VSOCKETS_COMMON=y
>   CONFIG_VHOST_VSOCK=m
>
> Launch QEMU as follows:
>   # qemu ... -device vhost-vsock-pci,id=vhost-vsock-pci0,guest-cid=3
>
> Guest and host can communicate via AF_VSOCK sockets.  The host's CID (address)
> is 2 and the guest must be assigned a CID (3 in the example above).
>
> Status
> ------
> This patch series implements the latest draft specification.  Please review.
>
> Asias He (4):
>   VSOCK: Introduce virtio_vsock_common.ko
>   VSOCK: Introduce virtio_transport.ko
>   VSOCK: Introduce vhost_vsock.ko
>   VSOCK: Add Makefile and Kconfig
>
> Stefan Hajnoczi (1):
>   VSOCK: transport-specific vsock_transport functions
>
>  MAINTAINERS                                        |  13 +
>  drivers/vhost/Kconfig                              |  15 +
>  drivers/vhost/Makefile                             |   4 +
>  drivers/vhost/vsock.c                              | 607 +++++++++++++++
>  drivers/vhost/vsock.h                              |   4 +
>  include/linux/virtio_vsock.h                       | 167 +++++
>  include/net/af_vsock.h                             |   3 +
>  .../trace/events/vsock_virtio_transport_common.h   | 144 ++++
>  include/uapi/linux/virtio_ids.h                    |   1 +
>  include/uapi/linux/virtio_vsock.h                  |  87 +++
>  net/vmw_vsock/Kconfig                              |  19 +
>  net/vmw_vsock/Makefile                             |   2 +
>  net/vmw_vsock/af_vsock.c                           |   9 +
>  net/vmw_vsock/virtio_transport.c                   | 481 ++++++++++++
>  net/vmw_vsock/virtio_transport_common.c            | 834 +++++++++++++++++++++
>  15 files changed, 2390 insertions(+)
>  create mode 100644 drivers/vhost/vsock.c
>  create mode 100644 drivers/vhost/vsock.h
>  create mode 100644 include/linux/virtio_vsock.h
>  create mode 100644 include/trace/events/vsock_virtio_transport_common.h
>  create mode 100644 include/uapi/linux/virtio_vsock.h
>  create mode 100644 net/vmw_vsock/virtio_transport.c
>  create mode 100644 net/vmw_vsock/virtio_transport_common.c
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC v4 0/5] Add virtio transport for AF_VSOCK
  2016-01-04  7:55 ` [RFC v4 0/5] Add virtio transport for AF_VSOCK Jason Wang
@ 2016-01-07  8:34   ` Stefan Hajnoczi
  0 siblings, 0 replies; 9+ messages in thread
From: Stefan Hajnoczi @ 2016-01-07  8:34 UTC (permalink / raw)
  To: Jason Wang
  Cc: Stefan Hajnoczi, kvm, Michael S. Tsirkin, netdev, virtualization,
	Matt Benjamin, Christoffer Dall, matt.ma

[-- Attachment #1: Type: text/plain, Size: 4683 bytes --]

On Mon, Jan 04, 2016 at 03:55:35PM +0800, Jason Wang wrote:
> 
> 
> On 12/22/2015 05:07 PM, Stefan Hajnoczi wrote:
> > This series is based on v4.4-rc2 and the "virtio: make find_vqs()
> > checkpatch.pl-friendly" patch I recently submitted.
> >
> > v4:
> >  * Addressed code review comments from Alex Bennee
> >  * MAINTAINERS file entries for new files
> >  * Trace events instead of pr_debug()
> >  * RST packet is sent when there is no listen socket
> >  * Allow guest->host connections again (began discussing netfilter support with
> >    Matt Benjamin instead of hard-coding security policy in virtio-vsock code)
> >  * Many checkpatch.pl cleanups (will be 100% clean in v5)
> >
> > v3:
> >  * Remove unnecessary 3-way handshake, just do REQUEST/RESPONSE instead
> >    of REQUEST/RESPONSE/ACK
> >  * Remove SOCK_DGRAM support and focus on SOCK_STREAM first
> >    (also drop v2 Patch 1, it's only needed for SOCK_DGRAM)
> >  * Only allow host->guest connections (same security model as latest
> >    VMware)
> >  * Don't put vhost vsock driver into staging
> >  * Add missing Kconfig dependencies (Arnd Bergmann <arnd@arndb.de>)
> >  * Remove unneeded variable used to store return value
> >    (Fengguang Wu <fengguang.wu@intel.com> and Julia Lawall
> >    <julia.lawall@lip6.fr>)
> >
> > v2:
> >  * Rebased onto Linux v4.4-rc2
> >  * vhost: Refuse to assign reserved CIDs
> >  * vhost: Refuse guest CID if already in use
> >  * vhost: Only accept correctly addressed packets (no spoofing!)
> >  * vhost: Support flexible rx/tx descriptor layout
> >  * vhost: Add missing total_tx_buf decrement
> >  * virtio_transport: Fix total_tx_buf accounting
> >  * virtio_transport: Add virtio_transport global mutex to prevent races
> >  * common: Notify other side of SOCK_STREAM disconnect (fixes shutdown
> >    semantics)
> >  * common: Avoid recursive mutex_lock(tx_lock) for write_space (fixes deadlock)
> >  * common: Define VIRTIO_VSOCK_TYPE_STREAM/DGRAM hardware interface constants
> >  * common: Define VIRTIO_VSOCK_SHUTDOWN_RCV/SEND hardware interface constants
> >  * common: Fix peer_buf_alloc inheritance on child socket
> >
> > This patch series adds a virtio transport for AF_VSOCK (net/vmw_vsock/).
> > AF_VSOCK is designed for communication between virtual machines and
> > hypervisors.  It is currently only implemented for VMware's VMCI transport.
> >
> > This series implements the proposed virtio-vsock device specification from
> > here:
> > http://permalink.gmane.org/gmane.comp.emulators.virtio.devel/980
> >
> > Most of the work was done by Asias He and Gerd Hoffmann a while back.  I have
> > picked up the series again.
> >
> > The QEMU userspace changes are here:
> > https://github.com/stefanha/qemu/commits/vsock
> >
> > Why virtio-vsock?
> > -----------------
> > Guest<->host communication is currently done over the virtio-serial device.
> > This makes it hard to port sockets API-based applications and is limited to
> > static ports.
> >
> > virtio-vsock uses the sockets API so that applications can rely on familiar
> > SOCK_STREAM semantics.  Applications on the host can easily connect to guest
> > agents because the sockets API allows multiple connections to a listen socket
> > (unlike virtio-serial).  This simplifies the guest<->host communication and
> > eliminates the need for extra processes on the host to arbitrate virtio-serial
> > ports.
> >
> > Overview
> > --------
> > This series adds 3 pieces:
> >
> > 1. virtio_transport_common.ko - core virtio vsock code that uses vsock.ko
> >
> > 2. virtio_transport.ko - guest driver
> >
> > 3. drivers/vhost/vsock.ko - host driver
> 
> Have a (dumb maybe) question after a quick glance at the codes:
> 
> Is there any chance to reuse existed virtio-net/vhost-net codes? For
> example, using virito-net instead of a new device as a transport in
> guest and using vhost-net (especially consider it uses a socket as
> backend) in host. Maybe just a new virtio-net header type for vsock. I'm
> asking since I don't see any blocker for doing this.

virtio-net is an Ethernet device with best-effort delivery while
virtio-vsock is guaranteed delivery.  That's the fundamental difference
at the protocol level.

At the driver level AF_VSOCK isn't supposed to have a netdev or
user-visible network interface that can be misconfigured by the guest
administrator.

Trying to make the existing virtio-net device work for this use case is
probably nastier than using a dedicated device since lot of the
virtio-net configuration space and control features don't make sense for
AF_VSOCK.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC v4 0/5] Add virtio transport for AF_VSOCK
  2015-12-22  9:07 [RFC v4 0/5] Add virtio transport for AF_VSOCK Stefan Hajnoczi
                   ` (5 preceding siblings ...)
  2016-01-04  7:55 ` [RFC v4 0/5] Add virtio transport for AF_VSOCK Jason Wang
@ 2016-01-28 15:19 ` Stefan Hajnoczi
  6 siblings, 0 replies; 9+ messages in thread
From: Stefan Hajnoczi @ 2016-01-28 15:19 UTC (permalink / raw)
  To: kvm
  Cc: Michael S. Tsirkin, netdev, virtualization, Matt Benjamin,
	Christoffer Dall, matt.ma


[-- Attachment #1.1: Type: text/plain, Size: 5793 bytes --]

On Tue, Dec 22, 2015 at 05:07:33PM +0800, Stefan Hajnoczi wrote:
> This series is based on v4.4-rc2 and the "virtio: make find_vqs()
> checkpatch.pl-friendly" patch I recently submitted.
> 
> v4:
>  * Addressed code review comments from Alex Bennee
>  * MAINTAINERS file entries for new files
>  * Trace events instead of pr_debug()
>  * RST packet is sent when there is no listen socket
>  * Allow guest->host connections again (began discussing netfilter support with
>    Matt Benjamin instead of hard-coding security policy in virtio-vsock code)
>  * Many checkpatch.pl cleanups (will be 100% clean in v5)
> 
> v3:
>  * Remove unnecessary 3-way handshake, just do REQUEST/RESPONSE instead
>    of REQUEST/RESPONSE/ACK
>  * Remove SOCK_DGRAM support and focus on SOCK_STREAM first
>    (also drop v2 Patch 1, it's only needed for SOCK_DGRAM)
>  * Only allow host->guest connections (same security model as latest
>    VMware)
>  * Don't put vhost vsock driver into staging
>  * Add missing Kconfig dependencies (Arnd Bergmann <arnd@arndb.de>)
>  * Remove unneeded variable used to store return value
>    (Fengguang Wu <fengguang.wu@intel.com> and Julia Lawall
>    <julia.lawall@lip6.fr>)
> 
> v2:
>  * Rebased onto Linux v4.4-rc2
>  * vhost: Refuse to assign reserved CIDs
>  * vhost: Refuse guest CID if already in use
>  * vhost: Only accept correctly addressed packets (no spoofing!)
>  * vhost: Support flexible rx/tx descriptor layout
>  * vhost: Add missing total_tx_buf decrement
>  * virtio_transport: Fix total_tx_buf accounting
>  * virtio_transport: Add virtio_transport global mutex to prevent races
>  * common: Notify other side of SOCK_STREAM disconnect (fixes shutdown
>    semantics)
>  * common: Avoid recursive mutex_lock(tx_lock) for write_space (fixes deadlock)
>  * common: Define VIRTIO_VSOCK_TYPE_STREAM/DGRAM hardware interface constants
>  * common: Define VIRTIO_VSOCK_SHUTDOWN_RCV/SEND hardware interface constants
>  * common: Fix peer_buf_alloc inheritance on child socket
> 
> This patch series adds a virtio transport for AF_VSOCK (net/vmw_vsock/).
> AF_VSOCK is designed for communication between virtual machines and
> hypervisors.  It is currently only implemented for VMware's VMCI transport.
> 
> This series implements the proposed virtio-vsock device specification from
> here:
> http://permalink.gmane.org/gmane.comp.emulators.virtio.devel/980
> 
> Most of the work was done by Asias He and Gerd Hoffmann a while back.  I have
> picked up the series again.
> 
> The QEMU userspace changes are here:
> https://github.com/stefanha/qemu/commits/vsock
> 
> Why virtio-vsock?
> -----------------
> Guest<->host communication is currently done over the virtio-serial device.
> This makes it hard to port sockets API-based applications and is limited to
> static ports.
> 
> virtio-vsock uses the sockets API so that applications can rely on familiar
> SOCK_STREAM semantics.  Applications on the host can easily connect to guest
> agents because the sockets API allows multiple connections to a listen socket
> (unlike virtio-serial).  This simplifies the guest<->host communication and
> eliminates the need for extra processes on the host to arbitrate virtio-serial
> ports.
> 
> Overview
> --------
> This series adds 3 pieces:
> 
> 1. virtio_transport_common.ko - core virtio vsock code that uses vsock.ko
> 
> 2. virtio_transport.ko - guest driver
> 
> 3. drivers/vhost/vsock.ko - host driver
> 
> Howto
> -----
> The following kernel options are needed:
>   CONFIG_VSOCKETS=y
>   CONFIG_VIRTIO_VSOCKETS=y
>   CONFIG_VIRTIO_VSOCKETS_COMMON=y
>   CONFIG_VHOST_VSOCK=m
> 
> Launch QEMU as follows:
>   # qemu ... -device vhost-vsock-pci,id=vhost-vsock-pci0,guest-cid=3
> 
> Guest and host can communicate via AF_VSOCK sockets.  The host's CID (address)
> is 2 and the guest must be assigned a CID (3 in the example above).
> 
> Status
> ------
> This patch series implements the latest draft specification.  Please review.
> 
> Asias He (4):
>   VSOCK: Introduce virtio_vsock_common.ko
>   VSOCK: Introduce virtio_transport.ko
>   VSOCK: Introduce vhost_vsock.ko
>   VSOCK: Add Makefile and Kconfig
> 
> Stefan Hajnoczi (1):
>   VSOCK: transport-specific vsock_transport functions
> 
>  MAINTAINERS                                        |  13 +
>  drivers/vhost/Kconfig                              |  15 +
>  drivers/vhost/Makefile                             |   4 +
>  drivers/vhost/vsock.c                              | 607 +++++++++++++++
>  drivers/vhost/vsock.h                              |   4 +
>  include/linux/virtio_vsock.h                       | 167 +++++
>  include/net/af_vsock.h                             |   3 +
>  .../trace/events/vsock_virtio_transport_common.h   | 144 ++++
>  include/uapi/linux/virtio_ids.h                    |   1 +
>  include/uapi/linux/virtio_vsock.h                  |  87 +++
>  net/vmw_vsock/Kconfig                              |  19 +
>  net/vmw_vsock/Makefile                             |   2 +
>  net/vmw_vsock/af_vsock.c                           |   9 +
>  net/vmw_vsock/virtio_transport.c                   | 481 ++++++++++++
>  net/vmw_vsock/virtio_transport_common.c            | 834 +++++++++++++++++++++
>  15 files changed, 2390 insertions(+)
>  create mode 100644 drivers/vhost/vsock.c
>  create mode 100644 drivers/vhost/vsock.h
>  create mode 100644 include/linux/virtio_vsock.h
>  create mode 100644 include/trace/events/vsock_virtio_transport_common.h
>  create mode 100644 include/uapi/linux/virtio_vsock.h
>  create mode 100644 net/vmw_vsock/virtio_transport.c
>  create mode 100644 net/vmw_vsock/virtio_transport_common.c

Michael and Jason: Ping

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-01-28 15:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-22  9:07 [RFC v4 0/5] Add virtio transport for AF_VSOCK Stefan Hajnoczi
2015-12-22  9:07 ` [RFC v4 1/5] VSOCK: transport-specific vsock_transport functions Stefan Hajnoczi
2015-12-22  9:07 ` [RFC v4 2/5] VSOCK: Introduce virtio_vsock_common.ko Stefan Hajnoczi
2015-12-22  9:07 ` [RFC v4 3/5] VSOCK: Introduce virtio_transport.ko Stefan Hajnoczi
2015-12-22  9:07 ` [RFC v4 4/5] VSOCK: Introduce vhost_vsock.ko Stefan Hajnoczi
2015-12-22  9:07 ` [RFC v4 5/5] VSOCK: Add Makefile and Kconfig Stefan Hajnoczi
2016-01-04  7:55 ` [RFC v4 0/5] Add virtio transport for AF_VSOCK Jason Wang
2016-01-07  8:34   ` Stefan Hajnoczi
2016-01-28 15:19 ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).