All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/3] vsockmon: virtual device to monitor AF_VSOCK sockets.
@ 2016-05-28 16:29 ggarcia
  2016-05-28 16:29 ` [RFC 1/3] vsockmon: Add tap functions ggarcia
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: ggarcia @ 2016-05-28 16:29 UTC (permalink / raw)
  To: netdev; +Cc: jhansen, stefanha, Gerard Garcia

From: Gerard Garcia <ggarcia@deic.uab.cat>

Virtual socket transports operate at kernel level therefore, there is no easy way to see the traffic exchanged between virtual machines and hypervisors that communicate using AF_VSOCK sockets. In addition, being able to see the control messages exchanged by the transports may be useful for debugging and optimization purposes. This patch adds a virtual device that may be used to see the traffic exchanged between virtual machines and hypervisors through AF_VSOCK sockets.

Its structure is based on the nlmon device and this version just targets the virtio transport, but support for the VMCI transport can be easily implemented. The vsockmon header consists of two structs: a generic header and a header specific to the transport. The generic header allows to follow an AF_VSOCK stream without having to understand the details of the transport while the transport header gives more detail which may be useful for troubleshooting and debugging.

The repository https://github.com/GerardGarcia/linux/tree/vsockmon implements these patches over the Stefan Hajnoczi vsock-next repository https://github.com/stefanha/linux/tree/vsock-next where the virtio trasnport is implemented. In the repository there is also a simple program that shows the traffic from a vsockmon device: https://github.com/GerardGarcia/linux/blob/vsockmon/vsockmon.c that can be used for testing.

Any thoughts and comments will be greatly appreciated.

Thanks to Stefan Hajnoczi for his help.

Gerard

Gerard Garcia (3):
  vsockmon: Add tap functions.
  vsockmon: Add vsockmon device
  vsockmon: Add vsock hooks

 drivers/net/Kconfig           |   8 ++
 drivers/net/Makefile          |   1 +
 drivers/net/vsockmon.c        | 171 ++++++++++++++++++++++++++++++++++++++++++
 drivers/vhost/vsock.c         |  71 ++++++++++++++++++
 include/net/af_vsock.h        |  13 ++++
 include/uapi/linux/Kbuild     |   1 +
 include/uapi/linux/if_arp.h   |   1 +
 include/uapi/linux/vsockmon.h |  37 +++++++++
 net/vmw_vsock/af_vsock.c      | 105 ++++++++++++++++++++++++++
 9 files changed, 408 insertions(+)
 create mode 100644 drivers/net/vsockmon.c
 create mode 100644 include/uapi/linux/vsockmon.h

-- 
2.8.3

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC 1/3] vsockmon: Add tap functions.
  2016-05-28 16:29 [RFC 0/3] vsockmon: virtual device to monitor AF_VSOCK sockets ggarcia
@ 2016-05-28 16:29 ` ggarcia
  2016-06-01 21:07   ` Stefan Hajnoczi
  2016-05-28 16:29 ` [RFC 2/3] vsockmon: Add vsockmon device ggarcia
  2016-05-28 16:29 ` [RFC 3/3] vsockmon: Add vsock hooks ggarcia
  2 siblings, 1 reply; 13+ messages in thread
From: ggarcia @ 2016-05-28 16:29 UTC (permalink / raw)
  To: netdev; +Cc: jhansen, stefanha, Gerard Garcia

From: Gerard Garcia <ggarcia@deic.uab.cat>

Signed-off-by: Gerard Garcia <ggarcia@deic.uab.cat>
---
 include/net/af_vsock.h      |  13 ++++++
 include/uapi/linux/if_arp.h |   1 +
 net/vmw_vsock/af_vsock.c    | 105 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 119 insertions(+)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 15694c9..c4593d8 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -187,4 +187,17 @@ struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
 					 struct sockaddr_vm *dst);
 void vsock_for_each_connected_socket(void (*fn)(struct sock *sk));
 
+/**** TAP ****/
+
+struct vsock_tap {
+	struct net_device *dev;
+	struct module *module;
+	struct list_head list;
+};
+
+extern int vsock_init_tap(void);
+extern int vsock_add_tap(struct vsock_tap *vt);
+extern int vsock_remove_tap(struct vsock_tap *vt);
+extern void vsock_deliver_tap(struct sk_buff *skb);
+
 #endif /* __AF_VSOCK_H__ */
diff --git a/include/uapi/linux/if_arp.h b/include/uapi/linux/if_arp.h
index 4d024d7..869262a 100644
--- a/include/uapi/linux/if_arp.h
+++ b/include/uapi/linux/if_arp.h
@@ -95,6 +95,7 @@
 #define ARPHRD_IP6GRE	823		/* GRE over IPv6		*/
 #define ARPHRD_NETLINK	824		/* Netlink header		*/
 #define ARPHRD_6LOWPAN	825		/* IPv6 over LoWPAN             */
+#define ARPHRD_VSOCKMON 826		/* Vsock monitor header		*/
 
 #define ARPHRD_VOID	  0xFFFF	/* Void type, nothing is known */
 #define ARPHRD_NONE	  0xFFFE	/* zero header length */
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 6b158ab..ec7a05d 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -96,6 +96,7 @@
 #include <linux/unistd.h>
 #include <linux/wait.h>
 #include <linux/workqueue.h>
+#include <linux/if_arp.h>
 #include <net/sock.h>
 #include <net/af_vsock.h>
 
@@ -2012,6 +2013,110 @@ const struct vsock_transport *vsock_core_get_transport(void)
 }
 EXPORT_SYMBOL_GPL(vsock_core_get_transport);
 
+/**** TAP ****/
+static DEFINE_SPINLOCK(vsock_tap_lock);
+static struct list_head vsock_tap_all __read_mostly;
+
+int vsock_init_tap(void) {
+	INIT_LIST_HEAD(&vsock_tap_all);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(vsock_init_tap);
+
+int vsock_add_tap(struct vsock_tap *vt) {
+	if (unlikely(vt->dev->type != ARPHRD_VSOCKMON))
+		return -EINVAL;
+
+	spin_lock(&vsock_tap_lock);
+	list_add_rcu(&vt->list, &vsock_tap_all);
+	spin_unlock(&vsock_tap_lock);
+
+	__module_get(vt->module);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(vsock_add_tap);
+
+int __vsock_remove_tap(struct vsock_tap *vt) {
+	bool found = false;
+	struct vsock_tap *tmp;
+
+	spin_lock(&vsock_tap_lock);
+
+	list_for_each_entry(tmp, &vsock_tap_all, list) {
+		if (vt == tmp) {
+			list_del_rcu(&vt->list);
+			found = true;
+			goto out;
+		}
+	}
+
+	pr_warn("__vsock_remove_tap: %p not found\n", vt);
+out:
+	spin_unlock(&vsock_tap_lock);
+
+	if (found)
+		module_put(vt->module);
+
+	return found ? 0 : -ENODEV;
+}
+
+int vsock_remove_tap(struct vsock_tap *vt)
+{
+	int ret;
+
+	ret = __vsock_remove_tap(vt);
+	synchronize_net();
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(vsock_remove_tap);
+
+static int __vsock_deliver_tap_skb(struct sk_buff *skb,
+				     struct net_device *dev)
+{
+	struct sk_buff *vskb;
+	int ret = 0;
+
+	if (skb) {
+		dev_hold(dev);
+		vskb = skb_clone(skb, GFP_ATOMIC);
+		vskb->dev = dev;
+		vskb->pkt_type = PACKET_USER;
+		ret = dev_queue_xmit(vskb);
+		if (unlikely(ret > 0))
+			ret = net_xmit_errno(ret);
+
+		dev_put(dev);
+	}
+
+	return ret;
+}
+
+static void __vsock_deliver_tap(struct sk_buff *skb)
+{
+	int ret;
+	struct vsock_tap *tmp;
+
+	list_for_each_entry_rcu(tmp, &vsock_tap_all, list) {
+		ret = __vsock_deliver_tap_skb(skb, tmp->dev);
+		if (unlikely(ret))
+			break;
+	}
+}
+
+void vsock_deliver_tap(struct sk_buff *skb)
+{
+	rcu_read_lock();
+
+	if (unlikely(!list_empty(&vsock_tap_all)))
+		__vsock_deliver_tap(skb);
+
+	rcu_read_unlock();
+}
+EXPORT_SYMBOL_GPL(vsock_deliver_tap);
+
 MODULE_AUTHOR("VMware, Inc.");
 MODULE_DESCRIPTION("VMware Virtual Socket Family");
 MODULE_VERSION("1.0.1.0-k");
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 2/3] vsockmon: Add vsockmon device
  2016-05-28 16:29 [RFC 0/3] vsockmon: virtual device to monitor AF_VSOCK sockets ggarcia
  2016-05-28 16:29 ` [RFC 1/3] vsockmon: Add tap functions ggarcia
@ 2016-05-28 16:29 ` ggarcia
  2016-06-01 21:15   ` Stefan Hajnoczi
  2016-05-28 16:29 ` [RFC 3/3] vsockmon: Add vsock hooks ggarcia
  2 siblings, 1 reply; 13+ messages in thread
From: ggarcia @ 2016-05-28 16:29 UTC (permalink / raw)
  To: netdev; +Cc: jhansen, stefanha, Gerard Garcia

From: Gerard Garcia <ggarcia@deic.uab.cat>

Signed-off-by: Gerard Garcia <ggarcia@deic.uab.cat>
---
 drivers/net/Kconfig           |   8 ++
 drivers/net/Makefile          |   1 +
 drivers/net/vsockmon.c        | 171 ++++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/Kbuild     |   1 +
 include/uapi/linux/vsockmon.h |  37 +++++++++
 5 files changed, 218 insertions(+)
 create mode 100644 drivers/net/vsockmon.c
 create mode 100644 include/uapi/linux/vsockmon.h

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 0c5415b..42c43b6 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -330,6 +330,14 @@ config NET_VRF
 	  This option enables the support for mapping interfaces into VRF's. The
 	  support enables VRF devices.
 
+config VSOCKMON
+    tristate "Virtual vsock monitoring device"
+    depends on VHOST_VSOCK
+    ---help---
+     This option enables a monitoring net device for vsock sockets. It is
+     mostly intended for developers or support to debug vsock issues. If
+     unsure, say N.
+
 endif # NET_CORE
 
 config SUNGEM_PHY
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 7336cbd..e2188d4 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -28,6 +28,7 @@ obj-$(CONFIG_GENEVE) += geneve.o
 obj-$(CONFIG_GTP) += gtp.o
 obj-$(CONFIG_NLMON) += nlmon.o
 obj-$(CONFIG_NET_VRF) += vrf.o
+obj-$(CONFIG_VSOCKMON) += vsockmon.o
 
 #
 # Networking Drivers
diff --git a/drivers/net/vsockmon.c b/drivers/net/vsockmon.c
new file mode 100644
index 0000000..becddc9
--- /dev/null
+++ b/drivers/net/vsockmon.c
@@ -0,0 +1,171 @@
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/if_arp.h>
+#include <net/rtnetlink.h>
+#include <net/sock.h>
+#include <net/af_vsock.h>
+#include <uapi/linux/vsockmon.h>
+
+/* Virtio transport max packet size plus header */
+#define DEFAULT_MTU 1024 * 64 + sizeof(struct af_vsockmon_hdr);
+
+struct pcpu_lstats {
+	u64 rx_packets;
+	u64 rx_bytes;
+	struct u64_stats_sync syncp;
+};
+
+static int vsockmon_dev_init(struct net_device *dev)
+{
+	dev->lstats = netdev_alloc_pcpu_stats(struct pcpu_lstats);
+	return dev->lstats == NULL ? -ENOMEM : 0;
+}
+
+static void vsockmon_dev_uninit(struct net_device *dev)
+{
+	free_percpu(dev->lstats);
+}
+
+struct vsockmon {
+	struct vsock_tap vt;
+};
+
+static int vsockmon_open(struct net_device *dev)
+{
+	struct vsockmon *vsockmon = netdev_priv(dev);
+
+	vsockmon->vt.dev = dev;
+	vsockmon->vt.module = THIS_MODULE;
+	return vsock_add_tap(&vsockmon->vt);
+}
+
+static int vsockmon_close(struct net_device *dev) {
+	struct vsockmon *vsockmon = netdev_priv(dev);
+
+	return vsock_remove_tap(&vsockmon->vt);
+}
+
+static netdev_tx_t vsockmon_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	int len = skb->len;
+	struct pcpu_lstats *stats = this_cpu_ptr(dev->lstats);
+
+	u64_stats_update_begin(&stats->syncp);
+	stats->rx_bytes += len;
+	stats->rx_packets++;
+	u64_stats_update_end(&stats->syncp);
+
+	dev_kfree_skb(skb);
+
+	return NETDEV_TX_OK;
+}
+
+static struct rtnl_link_stats64 *
+vsockmon_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
+{
+	int i;
+	u64 bytes = 0, packets = 0;
+
+	for_each_possible_cpu(i) {
+		const struct pcpu_lstats *vstats;
+		u64 tbytes, tpackets;
+		unsigned int start;
+
+		vstats = per_cpu_ptr(dev->lstats, i);
+
+		do {
+			start = u64_stats_fetch_begin_irq(&vstats->syncp);
+			tbytes = vstats->rx_bytes;
+			tpackets = vstats->rx_packets;
+		} while (u64_stats_fetch_retry_irq(&vstats->syncp, start));
+
+		packets += tpackets;
+		bytes += tbytes;
+	}
+
+	stats->rx_packets = packets;
+	stats->tx_packets = 0;
+
+	stats->rx_bytes = bytes;
+	stats->tx_bytes = 0;
+
+	return stats;
+}
+
+static int vsockmon_is_valid_mtu(int new_mtu)
+{
+	return new_mtu >= (int) sizeof(struct af_vsockmon_hdr);
+}
+
+static int vsockmon_change_mtu(struct net_device *dev, int new_mtu)
+{
+	if (!vsockmon_is_valid_mtu(new_mtu))
+		return -EINVAL;
+
+	dev->mtu = new_mtu;
+	return 0;
+}
+
+static const struct net_device_ops vsockmon_ops = {
+	.ndo_init = vsockmon_dev_init,
+	.ndo_uninit = vsockmon_dev_uninit,
+	.ndo_open = vsockmon_open,
+	.ndo_stop = vsockmon_close,
+	.ndo_start_xmit = vsockmon_xmit,
+	.ndo_get_stats64 = vsockmon_get_stats64,
+	.ndo_change_mtu = vsockmon_change_mtu,
+};
+
+static u32 always_on(struct net_device *dev)
+{
+	return 1;
+}
+
+static const struct ethtool_ops vsockmon_ethtool_ops = {
+	.get_link = always_on,
+};
+
+static void vsockmon_setup(struct net_device *dev)
+{
+	dev->type = ARPHRD_VSOCKMON;
+	dev->priv_flags |= IFF_NO_QUEUE;
+
+	dev->netdev_ops	= &vsockmon_ops;
+	dev->ethtool_ops = &vsockmon_ethtool_ops;
+	dev->destructor	= free_netdev;
+
+	dev->features = NETIF_F_SG | NETIF_F_FRAGLIST |
+			NETIF_F_HIGHDMA | NETIF_F_LLTX;
+
+	dev->flags = IFF_NOARP;
+
+	dev->mtu = DEFAULT_MTU;
+}
+
+static struct rtnl_link_ops vsockmon_link_ops __read_mostly = {
+	.kind			= "vsockmon",
+	.priv_size		= sizeof(struct vsockmon),
+	.setup			= vsockmon_setup,
+};
+
+static __init int vsockmon_register(void)
+{
+	int ret = rtnl_link_register(&vsockmon_link_ops);
+	if (!ret)
+		vsock_init_tap();
+
+	return ret;
+}
+
+static __exit void vsockmon_unregister(void)
+{
+	rtnl_link_unregister(&vsockmon_link_ops);
+}
+
+module_init(vsockmon_register);
+module_exit(vsockmon_unregister);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Gerard Garcia <ggarcia@deic.uab.cat>");
+MODULE_DESCRIPTION("Vsock monitoring device");
+MODULE_ALIAS_RTNL_LINK("vsockmon");
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 5f047d2..b1836cc 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -454,6 +454,7 @@ header-y += virtio_scsi.h
 header-y += virtio_types.h
 header-y += virtio_vsock.h
 header-y += vm_sockets.h
+header-y += vsockmon.h
 header-y += vt.h
 header-y += wait.h
 header-y += wanrouter.h
diff --git a/include/uapi/linux/vsockmon.h b/include/uapi/linux/vsockmon.h
new file mode 100644
index 0000000..c73166f
--- /dev/null
+++ b/include/uapi/linux/vsockmon.h
@@ -0,0 +1,37 @@
+#ifndef _UAPI_VSOCKMON_H
+#define _UAPI_VSOCKMON_H
+
+#include <linux/virtio_vsock.h>
+
+/* Packet structure of packets received from the vsockmon device. */
+
+struct af_vsockmon_g {
+	unsigned short op; /* enum af_vsock_g_ops */
+	unsigned int src_cid;
+	unsigned int src_port;
+	unsigned int dst_cid;
+	unsigned int dst_port;
+};
+
+struct af_vsockmon_hdr {
+	unsigned short type;  /* enum af_vosck_type */
+	struct af_vsockmon_g g_hdr;
+	union {
+		struct virtio_vsock_hdr virtio_hdr;
+	} t_hdr;
+};
+
+enum af_vsockmon_type {
+	AF_VSOCK_GENERIC = 1, /* No transport header */
+	AF_VSOCK_VIRTIO = 2,  /* Addtional virtio transport header */
+};
+
+enum af_vsockmon_g_ops {
+	AF_VSOCK_G_OP_UNKNOWN = 0,
+	AF_VSOCK_G_OP_CONNECT = 1,
+	AF_VSOCK_G_OP_DISCONNECT = 2,
+	AF_VSOCK_G_OP_CONTROL = 3,
+	AF_VSOCK_G_OP_PAYLOAD = 4,
+};
+
+#endif
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 3/3] vsockmon: Add vsock hooks
  2016-05-28 16:29 [RFC 0/3] vsockmon: virtual device to monitor AF_VSOCK sockets ggarcia
  2016-05-28 16:29 ` [RFC 1/3] vsockmon: Add tap functions ggarcia
  2016-05-28 16:29 ` [RFC 2/3] vsockmon: Add vsockmon device ggarcia
@ 2016-05-28 16:29 ` ggarcia
  2016-06-01 21:19   ` Stefan Hajnoczi
  2 siblings, 1 reply; 13+ messages in thread
From: ggarcia @ 2016-05-28 16:29 UTC (permalink / raw)
  To: netdev; +Cc: jhansen, stefanha, Gerard Garcia

From: Gerard Garcia <ggarcia@deic.uab.cat>

Signed-off-by: Gerard Garcia <ggarcia@deic.uab.cat>
---
 drivers/vhost/vsock.c | 71 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 71 insertions(+)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 17bfe4e..8fd5125 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -14,8 +14,10 @@
 #include <net/sock.h>
 #include <linux/virtio_vsock.h>
 #include <linux/vhost.h>
+#include <linux/skbuff.h>
 
 #include <net/af_vsock.h>
+#include <uapi/linux/vsockmon.h>
 #include "vhost.h"
 
 #define VHOST_VSOCK_DEFAULT_HOST_CID	2
@@ -45,6 +47,67 @@ struct vhost_vsock {
 	u32 guest_cid;
 };
 
+static struct sk_buff *
+virtio_vsock_pkt_to_skb(struct virtio_vsock_pkt *pkt)
+{
+	struct sk_buff *skb;
+	struct af_vsockmon_hdr *hdr;
+	void *payload;
+
+	u32 skb_len = sizeof(struct af_vsockmon_hdr) + pkt->len;
+
+	skb = alloc_skb(skb_len, GFP_ATOMIC);
+	if (!skb)
+		return NULL;
+
+	skb_reserve(skb, sizeof(struct af_vsockmon_hdr));
+
+	if (pkt->len) {
+		payload = skb_put(skb, pkt->len);
+		memcpy(payload, pkt->buf, pkt->len);
+	}
+
+	hdr = (struct af_vsockmon_hdr *) skb_push(skb, sizeof(*hdr));
+	hdr->type = AF_VSOCK_VIRTIO;
+
+	switch(pkt->hdr.op) {
+		case VIRTIO_VSOCK_OP_REQUEST:
+		case VIRTIO_VSOCK_OP_RESPONSE:
+			hdr->g_hdr.op = AF_VSOCK_G_OP_CONNECT;
+			break;
+		case VIRTIO_VSOCK_OP_RST:
+		case VIRTIO_VSOCK_OP_SHUTDOWN:
+			hdr->g_hdr.op = AF_VSOCK_G_OP_DISCONNECT;
+			break;
+		case VIRTIO_VSOCK_OP_RW:
+			hdr->g_hdr.op = AF_VSOCK_G_OP_PAYLOAD;
+			break;
+		case VIRTIO_VSOCK_OP_CREDIT_UPDATE:
+		case VIRTIO_VSOCK_OP_CREDIT_REQUEST:
+			hdr->g_hdr.op = AF_VSOCK_G_OP_CONTROL;
+			break;
+		default:
+			hdr->g_hdr.op = AF_VSOCK_G_OP_UNKNOWN;
+			break;
+	}
+
+	hdr->g_hdr.src_cid = pkt->hdr.src_cid;
+	hdr->g_hdr.src_port = pkt->hdr.src_port;
+	hdr->g_hdr.dst_cid = pkt->hdr.dst_cid;
+	hdr->g_hdr.dst_port = pkt->hdr.dst_port;
+
+	hdr->t_hdr.virtio_hdr = pkt->hdr;
+
+	return skb;
+}
+
+static void vsock_deliver_tap_pkt(struct virtio_vsock_pkt *pkt)
+{
+	struct sk_buff *skb = virtio_vsock_pkt_to_skb(pkt);
+	if (skb)
+		vsock_deliver_tap(skb);
+}
+
 static u32 vhost_transport_get_local_cid(void)
 {
 	return VHOST_VSOCK_DEFAULT_HOST_CID;
@@ -147,6 +210,11 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 
 		vsock->total_tx_buf -= pkt->len;
 
+		/* Deliver to monitoring devices all correctly transmitted
+		 * packets.
+		 */
+		vsock_deliver_tap_pkt(pkt);
+
 		virtio_transport_free_pkt(pkt);
 	}
 	if (added)
@@ -367,6 +435,9 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 			continue;
 		}
 
+		/* Deliver to monitoring devices all received packets */
+		vsock_deliver_tap_pkt(pkt);
+
 		/* Only accept correctly addressed packets */
 		if (le64_to_cpu(pkt->hdr.src_cid) == vsock->guest_cid)
 			virtio_transport_recv_pkt(pkt);
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC 1/3] vsockmon: Add tap functions.
  2016-05-28 16:29 ` [RFC 1/3] vsockmon: Add tap functions ggarcia
@ 2016-06-01 21:07   ` Stefan Hajnoczi
  2016-06-09 15:02     ` Gerard Garcia
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2016-06-01 21:07 UTC (permalink / raw)
  To: ggarcia; +Cc: netdev, jhansen

[-- Attachment #1: Type: text/plain, Size: 6419 bytes --]

On Sat, May 28, 2016 at 06:29:05PM +0200, ggarcia@abra.uab.cat wrote:
> From: Gerard Garcia <ggarcia@deic.uab.cat>
> 
> Signed-off-by: Gerard Garcia <ggarcia@deic.uab.cat>
> ---
>  include/net/af_vsock.h      |  13 ++++++
>  include/uapi/linux/if_arp.h |   1 +
>  net/vmw_vsock/af_vsock.c    | 105 ++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 119 insertions(+)
> 
> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> index 15694c9..c4593d8 100644
> --- a/include/net/af_vsock.h
> +++ b/include/net/af_vsock.h
> @@ -187,4 +187,17 @@ struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
>  					 struct sockaddr_vm *dst);
>  void vsock_for_each_connected_socket(void (*fn)(struct sock *sk));
>  
> +/**** TAP ****/
> +
> +struct vsock_tap {
> +	struct net_device *dev;
> +	struct module *module;
> +	struct list_head list;
> +};
> +
> +extern int vsock_init_tap(void);
> +extern int vsock_add_tap(struct vsock_tap *vt);
> +extern int vsock_remove_tap(struct vsock_tap *vt);
> +extern void vsock_deliver_tap(struct sk_buff *skb);

The other function prototypes in this header don't use "extern" either.
Please drop for consistency.

> +
>  #endif /* __AF_VSOCK_H__ */
> diff --git a/include/uapi/linux/if_arp.h b/include/uapi/linux/if_arp.h
> index 4d024d7..869262a 100644
> --- a/include/uapi/linux/if_arp.h
> +++ b/include/uapi/linux/if_arp.h
> @@ -95,6 +95,7 @@
>  #define ARPHRD_IP6GRE	823		/* GRE over IPv6		*/
>  #define ARPHRD_NETLINK	824		/* Netlink header		*/
>  #define ARPHRD_6LOWPAN	825		/* IPv6 over LoWPAN             */
> +#define ARPHRD_VSOCKMON 826		/* Vsock monitor header		*/

Previous lines use a tab character (^I) before the numeric constant.
Please follow that style for consistency.

I suggest calling it just ARPHRD_VSOCK.  Netlink also uses
ARPHRD_NETLINK instead of ARPHRD_NLMON.

>  
>  #define ARPHRD_VOID	  0xFFFF	/* Void type, nothing is known */
>  #define ARPHRD_NONE	  0xFFFE	/* zero header length */
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> index 6b158ab..ec7a05d 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -96,6 +96,7 @@
>  #include <linux/unistd.h>
>  #include <linux/wait.h>
>  #include <linux/workqueue.h>
> +#include <linux/if_arp.h>
>  #include <net/sock.h>
>  #include <net/af_vsock.h>
>  
> @@ -2012,6 +2013,110 @@ const struct vsock_transport *vsock_core_get_transport(void)
>  }
>  EXPORT_SYMBOL_GPL(vsock_core_get_transport);
>  
> +/**** TAP ****/

Feel free to put this in a separate source file.  The Kbuild can link
multiple objects into a single kernel module.  That would be cleaner
than using a big comment to separate it from af_vsock.c code.

I wonder whether this tap mechanism should be generic so that netlink,
vsock, and other address families can reuse the code.  This is basically
copied from netlink.

> +static DEFINE_SPINLOCK(vsock_tap_lock);
> +static struct list_head vsock_tap_all __read_mostly;
> +
> +int vsock_init_tap(void) {
> +	INIT_LIST_HEAD(&vsock_tap_all);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(vsock_init_tap);

Can you use LIST_HEAD_INIT() on the vsock_tap_all declaration line to
eliminate the need for vsock_init_tap() completely?

> +
> +int vsock_add_tap(struct vsock_tap *vt) {
> +	if (unlikely(vt->dev->type != ARPHRD_VSOCKMON))
> +		return -EINVAL;
> +
> +	spin_lock(&vsock_tap_lock);
> +	list_add_rcu(&vt->list, &vsock_tap_all);
> +	spin_unlock(&vsock_tap_lock);
> +
> +	__module_get(vt->module);

It's slightly safer to get the module before publishing it on the list.
But in practice I guess the caller is the module so the module won't
disappear underneath us.

> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(vsock_add_tap);
> +
> +int __vsock_remove_tap(struct vsock_tap *vt) {
> +	bool found = false;
> +	struct vsock_tap *tmp;
> +
> +	spin_lock(&vsock_tap_lock);
> +
> +	list_for_each_entry(tmp, &vsock_tap_all, list) {
> +		if (vt == tmp) {
> +			list_del_rcu(&vt->list);
> +			found = true;
> +			goto out;
> +		}
> +	}
> +
> +	pr_warn("__vsock_remove_tap: %p not found\n", vt);
> +out:
> +	spin_unlock(&vsock_tap_lock);
> +
> +	if (found)
> +		module_put(vt->module);
> +
> +	return found ? 0 : -ENODEV;
> +}
> +
> +int vsock_remove_tap(struct vsock_tap *vt)
> +{
> +	int ret;
> +
> +	ret = __vsock_remove_tap(vt);
> +	synchronize_net();

module_put() should be called after synchronize_net().  That way we
guarantee that no one is still accessing vt when the module is put.  The
caller is probably the module though, so maybe there is an implicit
reference...

> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(vsock_remove_tap);
> +
> +static int __vsock_deliver_tap_skb(struct sk_buff *skb,
> +				     struct net_device *dev)
> +{
> +	struct sk_buff *vskb;
> +	int ret = 0;
> +
> +	if (skb) {
> +		dev_hold(dev);
> +		vskb = skb_clone(skb, GFP_ATOMIC);

You must handle the case where skb_clone() returns NULL.  In other
words, we don't have enough memory to capture the packet and just return
-ENOMEM.

> +		vskb->dev = dev;
> +		vskb->pkt_type = PACKET_USER;

I'm not sure if PACKET_USER is correct (or if pkt_type matters at all).

PACKET_USER/PACKET_KERNEL seems to be purely a netlink concept to
distinguish netlink packets originating from the kernel or userspace.

AF_VSOCK is more like Ethernet where packets come from the host itself
or from a VM.  I would consider PACKET_HOST and PACKET_OTHERHOST.

> +		ret = dev_queue_xmit(vskb);
> +		if (unlikely(ret > 0))
> +			ret = net_xmit_errno(ret);
> +
> +		dev_put(dev);
> +	}
> +
> +	return ret;
> +}
> +
> +static void __vsock_deliver_tap(struct sk_buff *skb)
> +{
> +	int ret;
> +	struct vsock_tap *tmp;
> +
> +	list_for_each_entry_rcu(tmp, &vsock_tap_all, list) {
> +		ret = __vsock_deliver_tap_skb(skb, tmp->dev);
> +		if (unlikely(ret))
> +			break;
> +	}
> +}
> +
> +void vsock_deliver_tap(struct sk_buff *skb)
> +{
> +	rcu_read_lock();
> +
> +	if (unlikely(!list_empty(&vsock_tap_all)))
> +		__vsock_deliver_tap(skb);
> +
> +	rcu_read_unlock();
> +}
> +EXPORT_SYMBOL_GPL(vsock_deliver_tap);
> +
>  MODULE_AUTHOR("VMware, Inc.");
>  MODULE_DESCRIPTION("VMware Virtual Socket Family");
>  MODULE_VERSION("1.0.1.0-k");
> -- 
> 2.8.3
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 2/3] vsockmon: Add vsockmon device
  2016-05-28 16:29 ` [RFC 2/3] vsockmon: Add vsockmon device ggarcia
@ 2016-06-01 21:15   ` Stefan Hajnoczi
  2016-06-09 15:21     ` Gerard Garcia
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2016-06-01 21:15 UTC (permalink / raw)
  To: ggarcia; +Cc: netdev, jhansen

[-- Attachment #1: Type: text/plain, Size: 8115 bytes --]

On Sat, May 28, 2016 at 06:29:06PM +0200, ggarcia@abra.uab.cat wrote:
> From: Gerard Garcia <ggarcia@deic.uab.cat>
> 
> Signed-off-by: Gerard Garcia <ggarcia@deic.uab.cat>
> ---
>  drivers/net/Kconfig           |   8 ++
>  drivers/net/Makefile          |   1 +
>  drivers/net/vsockmon.c        | 171 ++++++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/Kbuild     |   1 +
>  include/uapi/linux/vsockmon.h |  37 +++++++++
>  5 files changed, 218 insertions(+)
>  create mode 100644 drivers/net/vsockmon.c
>  create mode 100644 include/uapi/linux/vsockmon.h
> 
> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
> index 0c5415b..42c43b6 100644
> --- a/drivers/net/Kconfig
> +++ b/drivers/net/Kconfig
> @@ -330,6 +330,14 @@ config NET_VRF
>  	  This option enables the support for mapping interfaces into VRF's. The
>  	  support enables VRF devices.
>  
> +config VSOCKMON
> +    tristate "Virtual vsock monitoring device"
> +    depends on VHOST_VSOCK
> +    ---help---
> +     This option enables a monitoring net device for vsock sockets. It is
> +     mostly intended for developers or support to debug vsock issues. If
> +     unsure, say N.
> +
>  endif # NET_CORE
>  
>  config SUNGEM_PHY
> diff --git a/drivers/net/Makefile b/drivers/net/Makefile
> index 7336cbd..e2188d4 100644
> --- a/drivers/net/Makefile
> +++ b/drivers/net/Makefile
> @@ -28,6 +28,7 @@ obj-$(CONFIG_GENEVE) += geneve.o
>  obj-$(CONFIG_GTP) += gtp.o
>  obj-$(CONFIG_NLMON) += nlmon.o
>  obj-$(CONFIG_NET_VRF) += vrf.o
> +obj-$(CONFIG_VSOCKMON) += vsockmon.o
>  
>  #
>  # Networking Drivers
> diff --git a/drivers/net/vsockmon.c b/drivers/net/vsockmon.c
> new file mode 100644
> index 0000000..becddc9
> --- /dev/null
> +++ b/drivers/net/vsockmon.c
> @@ -0,0 +1,171 @@
> +#include <linux/module.h>
> +#include <linux/kernel.h>
> +#include <linux/if_arp.h>
> +#include <net/rtnetlink.h>
> +#include <net/sock.h>
> +#include <net/af_vsock.h>
> +#include <uapi/linux/vsockmon.h>
> +
> +/* Virtio transport max packet size plus header */
> +#define DEFAULT_MTU 1024 * 64 + sizeof(struct af_vsockmon_hdr);
> +
> +struct pcpu_lstats {
> +	u64 rx_packets;
> +	u64 rx_bytes;
> +	struct u64_stats_sync syncp;
> +};
> +
> +static int vsockmon_dev_init(struct net_device *dev)
> +{
> +	dev->lstats = netdev_alloc_pcpu_stats(struct pcpu_lstats);
> +	return dev->lstats == NULL ? -ENOMEM : 0;
> +}
> +
> +static void vsockmon_dev_uninit(struct net_device *dev)
> +{
> +	free_percpu(dev->lstats);
> +}
> +
> +struct vsockmon {
> +	struct vsock_tap vt;
> +};
> +
> +static int vsockmon_open(struct net_device *dev)
> +{
> +	struct vsockmon *vsockmon = netdev_priv(dev);
> +
> +	vsockmon->vt.dev = dev;
> +	vsockmon->vt.module = THIS_MODULE;
> +	return vsock_add_tap(&vsockmon->vt);
> +}
> +
> +static int vsockmon_close(struct net_device *dev) {
> +	struct vsockmon *vsockmon = netdev_priv(dev);
> +
> +	return vsock_remove_tap(&vsockmon->vt);
> +}
> +
> +static netdev_tx_t vsockmon_xmit(struct sk_buff *skb, struct net_device *dev)
> +{
> +	int len = skb->len;
> +	struct pcpu_lstats *stats = this_cpu_ptr(dev->lstats);
> +
> +	u64_stats_update_begin(&stats->syncp);
> +	stats->rx_bytes += len;
> +	stats->rx_packets++;
> +	u64_stats_update_end(&stats->syncp);
> +
> +	dev_kfree_skb(skb);
> +
> +	return NETDEV_TX_OK;
> +}
> +
> +static struct rtnl_link_stats64 *
> +vsockmon_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
> +{
> +	int i;
> +	u64 bytes = 0, packets = 0;
> +
> +	for_each_possible_cpu(i) {
> +		const struct pcpu_lstats *vstats;
> +		u64 tbytes, tpackets;
> +		unsigned int start;
> +
> +		vstats = per_cpu_ptr(dev->lstats, i);
> +
> +		do {
> +			start = u64_stats_fetch_begin_irq(&vstats->syncp);
> +			tbytes = vstats->rx_bytes;
> +			tpackets = vstats->rx_packets;
> +		} while (u64_stats_fetch_retry_irq(&vstats->syncp, start));
> +
> +		packets += tpackets;
> +		bytes += tbytes;
> +	}
> +
> +	stats->rx_packets = packets;
> +	stats->tx_packets = 0;
> +
> +	stats->rx_bytes = bytes;
> +	stats->tx_bytes = 0;
> +
> +	return stats;
> +}
> +
> +static int vsockmon_is_valid_mtu(int new_mtu)
> +{
> +	return new_mtu >= (int) sizeof(struct af_vsockmon_hdr);
> +}
> +
> +static int vsockmon_change_mtu(struct net_device *dev, int new_mtu)
> +{
> +	if (!vsockmon_is_valid_mtu(new_mtu))
> +		return -EINVAL;
> +
> +	dev->mtu = new_mtu;
> +	return 0;
> +}

I wonder if the mtu serves any purpose.  What happens when you change it
from the default value?

> +
> +static const struct net_device_ops vsockmon_ops = {
> +	.ndo_init = vsockmon_dev_init,
> +	.ndo_uninit = vsockmon_dev_uninit,
> +	.ndo_open = vsockmon_open,
> +	.ndo_stop = vsockmon_close,
> +	.ndo_start_xmit = vsockmon_xmit,
> +	.ndo_get_stats64 = vsockmon_get_stats64,
> +	.ndo_change_mtu = vsockmon_change_mtu,
> +};
> +
> +static u32 always_on(struct net_device *dev)
> +{
> +	return 1;
> +}
> +
> +static const struct ethtool_ops vsockmon_ethtool_ops = {
> +	.get_link = always_on,
> +};
> +
> +static void vsockmon_setup(struct net_device *dev)
> +{
> +	dev->type = ARPHRD_VSOCKMON;
> +	dev->priv_flags |= IFF_NO_QUEUE;
> +
> +	dev->netdev_ops	= &vsockmon_ops;
> +	dev->ethtool_ops = &vsockmon_ethtool_ops;
> +	dev->destructor	= free_netdev;
> +
> +	dev->features = NETIF_F_SG | NETIF_F_FRAGLIST |
> +			NETIF_F_HIGHDMA | NETIF_F_LLTX;
> +
> +	dev->flags = IFF_NOARP;
> +
> +	dev->mtu = DEFAULT_MTU;
> +}
> +
> +static struct rtnl_link_ops vsockmon_link_ops __read_mostly = {
> +	.kind			= "vsockmon",
> +	.priv_size		= sizeof(struct vsockmon),
> +	.setup			= vsockmon_setup,
> +};
> +
> +static __init int vsockmon_register(void)
> +{
> +	int ret = rtnl_link_register(&vsockmon_link_ops);
> +	if (!ret)
> +		vsock_init_tap();
> +
> +	return ret;
> +}
> +
> +static __exit void vsockmon_unregister(void)
> +{
> +	rtnl_link_unregister(&vsockmon_link_ops);
> +}
> +
> +module_init(vsockmon_register);
> +module_exit(vsockmon_unregister);
> +
> +MODULE_LICENSE("GPL v2");
> +MODULE_AUTHOR("Gerard Garcia <ggarcia@deic.uab.cat>");
> +MODULE_DESCRIPTION("Vsock monitoring device");
> +MODULE_ALIAS_RTNL_LINK("vsockmon");

there should be some mention that this is derived from nlmon.c.

> diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
> index 5f047d2..b1836cc 100644
> --- a/include/uapi/linux/Kbuild
> +++ b/include/uapi/linux/Kbuild
> @@ -454,6 +454,7 @@ header-y += virtio_scsi.h
>  header-y += virtio_types.h
>  header-y += virtio_vsock.h
>  header-y += vm_sockets.h
> +header-y += vsockmon.h
>  header-y += vt.h
>  header-y += wait.h
>  header-y += wanrouter.h
> diff --git a/include/uapi/linux/vsockmon.h b/include/uapi/linux/vsockmon.h
> new file mode 100644
> index 0000000..c73166f
> --- /dev/null
> +++ b/include/uapi/linux/vsockmon.h
> @@ -0,0 +1,37 @@
> +#ifndef _UAPI_VSOCKMON_H
> +#define _UAPI_VSOCKMON_H
> +
> +#include <linux/virtio_vsock.h>
> +
> +/* Packet structure of packets received from the vsockmon device. */
> +
> +struct af_vsockmon_g {
> +	unsigned short op; /* enum af_vsock_g_ops */
> +	unsigned int src_cid;
> +	unsigned int src_port;
> +	unsigned int dst_cid;
> +	unsigned int dst_port;
> +};
> +
> +struct af_vsockmon_hdr {
> +	unsigned short type;  /* enum af_vosck_type */
> +	struct af_vsockmon_g g_hdr;
> +	union {
> +		struct virtio_vsock_hdr virtio_hdr;
> +	} t_hdr;
> +};

How does endianness work?  virtio_hdr uses little-endian fields on the
wire.  I guess that af_vsockmon_g is always CPU-endian.

> +
> +enum af_vsockmon_type {
> +	AF_VSOCK_GENERIC = 1, /* No transport header */
> +	AF_VSOCK_VIRTIO = 2,  /* Addtional virtio transport header */
> +};
> +
> +enum af_vsockmon_g_ops {
> +	AF_VSOCK_G_OP_UNKNOWN = 0,
> +	AF_VSOCK_G_OP_CONNECT = 1,
> +	AF_VSOCK_G_OP_DISCONNECT = 2,
> +	AF_VSOCK_G_OP_CONTROL = 3,
> +	AF_VSOCK_G_OP_PAYLOAD = 4,
> +};
> +
> +#endif
> -- 
> 2.8.3
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 3/3] vsockmon: Add vsock hooks
  2016-05-28 16:29 ` [RFC 3/3] vsockmon: Add vsock hooks ggarcia
@ 2016-06-01 21:19   ` Stefan Hajnoczi
  2016-06-09 15:27     ` Gerard Garcia
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2016-06-01 21:19 UTC (permalink / raw)
  To: ggarcia; +Cc: netdev, jhansen

[-- Attachment #1: Type: text/plain, Size: 3638 bytes --]

On Sat, May 28, 2016 at 06:29:07PM +0200, ggarcia@abra.uab.cat wrote:
> From: Gerard Garcia <ggarcia@deic.uab.cat>
> 
> Signed-off-by: Gerard Garcia <ggarcia@deic.uab.cat>
> ---
>  drivers/vhost/vsock.c | 71 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 71 insertions(+)
> 
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index 17bfe4e..8fd5125 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -14,8 +14,10 @@
>  #include <net/sock.h>
>  #include <linux/virtio_vsock.h>
>  #include <linux/vhost.h>
> +#include <linux/skbuff.h>
>  
>  #include <net/af_vsock.h>
> +#include <uapi/linux/vsockmon.h>
>  #include "vhost.h"
>  
>  #define VHOST_VSOCK_DEFAULT_HOST_CID	2
> @@ -45,6 +47,67 @@ struct vhost_vsock {
>  	u32 guest_cid;
>  };
>  
> +static struct sk_buff *
> +virtio_vsock_pkt_to_skb(struct virtio_vsock_pkt *pkt)
> +{
> +	struct sk_buff *skb;
> +	struct af_vsockmon_hdr *hdr;
> +	void *payload;
> +
> +	u32 skb_len = sizeof(struct af_vsockmon_hdr) + pkt->len;
> +
> +	skb = alloc_skb(skb_len, GFP_ATOMIC);
> +	if (!skb)
> +		return NULL;
> +
> +	skb_reserve(skb, sizeof(struct af_vsockmon_hdr));
> +
> +	if (pkt->len) {
> +		payload = skb_put(skb, pkt->len);
> +		memcpy(payload, pkt->buf, pkt->len);
> +	}
> +
> +	hdr = (struct af_vsockmon_hdr *) skb_push(skb, sizeof(*hdr));
> +	hdr->type = AF_VSOCK_VIRTIO;
> +
> +	switch(pkt->hdr.op) {
> +		case VIRTIO_VSOCK_OP_REQUEST:
> +		case VIRTIO_VSOCK_OP_RESPONSE:
> +			hdr->g_hdr.op = AF_VSOCK_G_OP_CONNECT;
> +			break;
> +		case VIRTIO_VSOCK_OP_RST:
> +		case VIRTIO_VSOCK_OP_SHUTDOWN:
> +			hdr->g_hdr.op = AF_VSOCK_G_OP_DISCONNECT;
> +			break;
> +		case VIRTIO_VSOCK_OP_RW:
> +			hdr->g_hdr.op = AF_VSOCK_G_OP_PAYLOAD;
> +			break;
> +		case VIRTIO_VSOCK_OP_CREDIT_UPDATE:
> +		case VIRTIO_VSOCK_OP_CREDIT_REQUEST:
> +			hdr->g_hdr.op = AF_VSOCK_G_OP_CONTROL;
> +			break;
> +		default:
> +			hdr->g_hdr.op = AF_VSOCK_G_OP_UNKNOWN;
> +			break;
> +	}
> +
> +	hdr->g_hdr.src_cid = pkt->hdr.src_cid;
> +	hdr->g_hdr.src_port = pkt->hdr.src_port;
> +	hdr->g_hdr.dst_cid = pkt->hdr.dst_cid;
> +	hdr->g_hdr.dst_port = pkt->hdr.dst_port;

Careful with endianness here.  pkt->hdr uses little-endian fields.

> +
> +	hdr->t_hdr.virtio_hdr = pkt->hdr;
> +
> +	return skb;
> +}
> +
> +static void vsock_deliver_tap_pkt(struct virtio_vsock_pkt *pkt)
> +{
> +	struct sk_buff *skb = virtio_vsock_pkt_to_skb(pkt);
> +	if (skb)
> +		vsock_deliver_tap(skb);

vsock_deliver_tap() doesn't take ownership of skb so it is leaked here!

In fact, given that the core vsock code is not skb-based, perhaps the
tap code shouldn't clone at all.  Just take ownership of the skb.  This
avoids the extra allocation and memcpy.

> +}
> +
>  static u32 vhost_transport_get_local_cid(void)
>  {
>  	return VHOST_VSOCK_DEFAULT_HOST_CID;
> @@ -147,6 +210,11 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
>  
>  		vsock->total_tx_buf -= pkt->len;
>  
> +		/* Deliver to monitoring devices all correctly transmitted
> +		 * packets.
> +		 */
> +		vsock_deliver_tap_pkt(pkt);
> +
>  		virtio_transport_free_pkt(pkt);
>  	}
>  	if (added)
> @@ -367,6 +435,9 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
>  			continue;
>  		}
>  
> +		/* Deliver to monitoring devices all received packets */
> +		vsock_deliver_tap_pkt(pkt);
> +
>  		/* Only accept correctly addressed packets */
>  		if (le64_to_cpu(pkt->hdr.src_cid) == vsock->guest_cid)
>  			virtio_transport_recv_pkt(pkt);
> -- 
> 2.8.3
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 1/3] vsockmon: Add tap functions.
  2016-06-01 21:07   ` Stefan Hajnoczi
@ 2016-06-09 15:02     ` Gerard Garcia
  2016-06-10 15:44       ` Stefan Hajnoczi
  0 siblings, 1 reply; 13+ messages in thread
From: Gerard Garcia @ 2016-06-09 15:02 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: netdev, jhansen



On 06/01/2016 11:07 PM, Stefan Hajnoczi wrote:
> On Sat, May 28, 2016 at 06:29:05PM +0200, ggarcia@abra.uab.cat wrote:
>> From: Gerard Garcia <ggarcia@deic.uab.cat>
>>
>> Signed-off-by: Gerard Garcia <ggarcia@deic.uab.cat>
>> ---
>>   include/net/af_vsock.h      |  13 ++++++
>>   include/uapi/linux/if_arp.h |   1 +
>>   net/vmw_vsock/af_vsock.c    | 105 ++++++++++++++++++++++++++++++++++++++++++++
>>   3 files changed, 119 insertions(+)
>>
>> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>> index 15694c9..c4593d8 100644
>> --- a/include/net/af_vsock.h
>> +++ b/include/net/af_vsock.h
>> @@ -187,4 +187,17 @@ struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
>>   					 struct sockaddr_vm *dst);
>>   void vsock_for_each_connected_socket(void (*fn)(struct sock *sk));
>>   
>> +/**** TAP ****/
>> +
>> +struct vsock_tap {
>> +	struct net_device *dev;
>> +	struct module *module;
>> +	struct list_head list;
>> +};
>> +
>> +extern int vsock_init_tap(void);
>> +extern int vsock_add_tap(struct vsock_tap *vt);
>> +extern int vsock_remove_tap(struct vsock_tap *vt);
>> +extern void vsock_deliver_tap(struct sk_buff *skb);
> The other function prototypes in this header don't use "extern" either.
> Please drop for consistency.
Ok
>> +
>>   #endif /* __AF_VSOCK_H__ */
>> diff --git a/include/uapi/linux/if_arp.h b/include/uapi/linux/if_arp.h
>> index 4d024d7..869262a 100644
>> --- a/include/uapi/linux/if_arp.h
>> +++ b/include/uapi/linux/if_arp.h
>> @@ -95,6 +95,7 @@
>>   #define ARPHRD_IP6GRE	823		/* GRE over IPv6		*/
>>   #define ARPHRD_NETLINK	824		/* Netlink header		*/
>>   #define ARPHRD_6LOWPAN	825		/* IPv6 over LoWPAN             */
>> +#define ARPHRD_VSOCKMON 826		/* Vsock monitor header		*/
> Previous lines use a tab character (^I) before the numeric constant.
> Please follow that style for consistency.
Ok
> I suggest calling it just ARPHRD_VSOCK.  Netlink also uses
> ARPHRD_NETLINK instead of ARPHRD_NLMON.
We define a new header used exclusively by the vsockmon device, nlmon 
copies the netlink header entirely.
I'm not sure if we should *link* the ARPHRD_VSOCK identifier to the 
vsockmon header.
>>   
>>   #define ARPHRD_VOID	  0xFFFF	/* Void type, nothing is known */
>>   #define ARPHRD_NONE	  0xFFFE	/* zero header length */
>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> index 6b158ab..ec7a05d 100644
>> --- a/net/vmw_vsock/af_vsock.c
>> +++ b/net/vmw_vsock/af_vsock.c
>> @@ -96,6 +96,7 @@
>>   #include <linux/unistd.h>
>>   #include <linux/wait.h>
>>   #include <linux/workqueue.h>
>> +#include <linux/if_arp.h>
>>   #include <net/sock.h>
>>   #include <net/af_vsock.h>
>>   
>> @@ -2012,6 +2013,110 @@ const struct vsock_transport *vsock_core_get_transport(void)
>>   }
>>   EXPORT_SYMBOL_GPL(vsock_core_get_transport);
>>   
>> +/**** TAP ****/
> Feel free to put this in a separate source file.  The Kbuild can link
> multiple objects into a single kernel module.  That would be cleaner
> than using a big comment to separate it from af_vsock.c code.
I'm following the af_vsock.c style, where different logic is separated 
using this style of comments. It is not a lot of code
so I thought it would be cleaner to have it in the same file.
> I wonder whether this tap mechanism should be generic so that netlink,
> vsock, and other address families can reuse the code.  This is basically
> copied from netlink.
>
>> +static DEFINE_SPINLOCK(vsock_tap_lock);
>> +static struct list_head vsock_tap_all __read_mostly;
>> +
>> +int vsock_init_tap(void) {
>> +	INIT_LIST_HEAD(&vsock_tap_all);
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(vsock_init_tap);
> Can you use LIST_HEAD_INIT() on the vsock_tap_all declaration line to
> eliminate the need for vsock_init_tap() completely?
Yes, I'll do that. I though we may need to do more initialization at 
some point, but for now it is better to just
statically initialize it.
>> +
>> +int vsock_add_tap(struct vsock_tap *vt) {
>> +	if (unlikely(vt->dev->type != ARPHRD_VSOCKMON))
>> +		return -EINVAL;
>> +
>> +	spin_lock(&vsock_tap_lock);
>> +	list_add_rcu(&vt->list, &vsock_tap_all);
>> +	spin_unlock(&vsock_tap_lock);
>> +
>> +	__module_get(vt->module);
> It's slightly safer to get the module before publishing it on the list.
> But in practice I guess the caller is the module so the module won't
> disappear underneath us.
This function is equal to the function in af_netlink.c used by nlmon. As 
you said, in practice the caller is the module
so it won't disappear.
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(vsock_add_tap);
>> +
>> +int __vsock_remove_tap(struct vsock_tap *vt) {
>> +	bool found = false;
>> +	struct vsock_tap *tmp;
>> +
>> +	spin_lock(&vsock_tap_lock);
>> +
>> +	list_for_each_entry(tmp, &vsock_tap_all, list) {
>> +		if (vt == tmp) {
>> +			list_del_rcu(&vt->list);
>> +			found = true;
>> +			goto out;
>> +		}
>> +	}
>> +
>> +	pr_warn("__vsock_remove_tap: %p not found\n", vt);
>> +out:
>> +	spin_unlock(&vsock_tap_lock);
>> +
>> +	if (found)
>> +		module_put(vt->module);
>> +
>> +	return found ? 0 : -ENODEV;
>> +}
>> +
>> +int vsock_remove_tap(struct vsock_tap *vt)
>> +{
>> +	int ret;
>> +
>> +	ret = __vsock_remove_tap(vt);
>> +	synchronize_net();
> module_put() should be called after synchronize_net().  That way we
> guarantee that no one is still accessing vt when the module is put.  The
> caller is probably the module though, so maybe there is an implicit
> reference...
Also, it is equal to the function in af_netlink.c used by nlmon. And as 
before, this function is supposed
to be called from the same module so it can't disappear before calling 
synchronize_net().
>> +
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(vsock_remove_tap);
>> +
>> +static int __vsock_deliver_tap_skb(struct sk_buff *skb,
>> +				     struct net_device *dev)
>> +{
>> +	struct sk_buff *vskb;
>> +	int ret = 0;
>> +
>> +	if (skb) {
>> +		dev_hold(dev);
>> +		vskb = skb_clone(skb, GFP_ATOMIC);
> You must handle the case where skb_clone() returns NULL.  In other
> words, we don't have enough memory to capture the packet and just return
> -ENOMEM.
You are right!
>> +		vskb->dev = dev;
>> +		vskb->pkt_type = PACKET_USER;
> I'm not sure if PACKET_USER is correct (or if pkt_type matters at all).
>
> PACKET_USER/PACKET_KERNEL seems to be purely a netlink concept to
> distinguish netlink packets originating from the kernel or userspace.
>
> AF_VSOCK is more like Ethernet where packets come from the host itself
> or from a VM.  I would consider PACKET_HOST and PACKET_OTHERHOST.
>
I'm not sure either if pkt_type matters. I haven't found any code that 
uses it so I'll just leave it unset.
>> +		ret = dev_queue_xmit(vskb);
>> +		if (unlikely(ret > 0))
>> +			ret = net_xmit_errno(ret);
>> +
>> +		dev_put(dev);
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +static void __vsock_deliver_tap(struct sk_buff *skb)
>> +{
>> +	int ret;
>> +	struct vsock_tap *tmp;
>> +
>> +	list_for_each_entry_rcu(tmp, &vsock_tap_all, list) {
>> +		ret = __vsock_deliver_tap_skb(skb, tmp->dev);
>> +		if (unlikely(ret))
>> +			break;
>> +	}
>> +}
>> +
>> +void vsock_deliver_tap(struct sk_buff *skb)
>> +{
>> +	rcu_read_lock();
>> +
>> +	if (unlikely(!list_empty(&vsock_tap_all)))
>> +		__vsock_deliver_tap(skb);
>> +
>> +	rcu_read_unlock();
>> +}
>> +EXPORT_SYMBOL_GPL(vsock_deliver_tap);
>> +
>>   MODULE_AUTHOR("VMware, Inc.");
>>   MODULE_DESCRIPTION("VMware Virtual Socket Family");
>>   MODULE_VERSION("1.0.1.0-k");
>> -- 
>> 2.8.3
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 2/3] vsockmon: Add vsockmon device
  2016-06-01 21:15   ` Stefan Hajnoczi
@ 2016-06-09 15:21     ` Gerard Garcia
  2016-06-10 15:37       ` Stefan Hajnoczi
  0 siblings, 1 reply; 13+ messages in thread
From: Gerard Garcia @ 2016-06-09 15:21 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: netdev, jhansen



On 06/01/2016 11:15 PM, Stefan Hajnoczi wrote:
> On Sat, May 28, 2016 at 06:29:06PM +0200, ggarcia@abra.uab.cat wrote:
>> From: Gerard Garcia <ggarcia@deic.uab.cat>
>>
>> Signed-off-by: Gerard Garcia <ggarcia@deic.uab.cat>
>> ---
>>   drivers/net/Kconfig           |   8 ++
>>   drivers/net/Makefile          |   1 +
>>   drivers/net/vsockmon.c        | 171 ++++++++++++++++++++++++++++++++++++++++++
>>   include/uapi/linux/Kbuild     |   1 +
>>   include/uapi/linux/vsockmon.h |  37 +++++++++
>>   5 files changed, 218 insertions(+)
>>   create mode 100644 drivers/net/vsockmon.c
>>   create mode 100644 include/uapi/linux/vsockmon.h
>>
>> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
>> index 0c5415b..42c43b6 100644
>> --- a/drivers/net/Kconfig
>> +++ b/drivers/net/Kconfig
>> @@ -330,6 +330,14 @@ config NET_VRF
>>   	  This option enables the support for mapping interfaces into VRF's. The
>>   	  support enables VRF devices.
>>   
>> +config VSOCKMON
>> +    tristate "Virtual vsock monitoring device"
>> +    depends on VHOST_VSOCK
>> +    ---help---
>> +     This option enables a monitoring net device for vsock sockets. It is
>> +     mostly intended for developers or support to debug vsock issues. If
>> +     unsure, say N.
>> +
>>   endif # NET_CORE
>>   
>>   config SUNGEM_PHY
>> diff --git a/drivers/net/Makefile b/drivers/net/Makefile
>> index 7336cbd..e2188d4 100644
>> --- a/drivers/net/Makefile
>> +++ b/drivers/net/Makefile
>> @@ -28,6 +28,7 @@ obj-$(CONFIG_GENEVE) += geneve.o
>>   obj-$(CONFIG_GTP) += gtp.o
>>   obj-$(CONFIG_NLMON) += nlmon.o
>>   obj-$(CONFIG_NET_VRF) += vrf.o
>> +obj-$(CONFIG_VSOCKMON) += vsockmon.o
>>   
>>   #
>>   # Networking Drivers
>> diff --git a/drivers/net/vsockmon.c b/drivers/net/vsockmon.c
>> new file mode 100644
>> index 0000000..becddc9
>> --- /dev/null
>> +++ b/drivers/net/vsockmon.c
>> @@ -0,0 +1,171 @@
>> +#include <linux/module.h>
>> +#include <linux/kernel.h>
>> +#include <linux/if_arp.h>
>> +#include <net/rtnetlink.h>
>> +#include <net/sock.h>
>> +#include <net/af_vsock.h>
>> +#include <uapi/linux/vsockmon.h>
>> +
>> +/* Virtio transport max packet size plus header */
>> +#define DEFAULT_MTU 1024 * 64 + sizeof(struct af_vsockmon_hdr);
>> +
>> +struct pcpu_lstats {
>> +	u64 rx_packets;
>> +	u64 rx_bytes;
>> +	struct u64_stats_sync syncp;
>> +};
>> +
>> +static int vsockmon_dev_init(struct net_device *dev)
>> +{
>> +	dev->lstats = netdev_alloc_pcpu_stats(struct pcpu_lstats);
>> +	return dev->lstats == NULL ? -ENOMEM : 0;
>> +}
>> +
>> +static void vsockmon_dev_uninit(struct net_device *dev)
>> +{
>> +	free_percpu(dev->lstats);
>> +}
>> +
>> +struct vsockmon {
>> +	struct vsock_tap vt;
>> +};
>> +
>> +static int vsockmon_open(struct net_device *dev)
>> +{
>> +	struct vsockmon *vsockmon = netdev_priv(dev);
>> +
>> +	vsockmon->vt.dev = dev;
>> +	vsockmon->vt.module = THIS_MODULE;
>> +	return vsock_add_tap(&vsockmon->vt);
>> +}
>> +
>> +static int vsockmon_close(struct net_device *dev) {
>> +	struct vsockmon *vsockmon = netdev_priv(dev);
>> +
>> +	return vsock_remove_tap(&vsockmon->vt);
>> +}
>> +
>> +static netdev_tx_t vsockmon_xmit(struct sk_buff *skb, struct net_device *dev)
>> +{
>> +	int len = skb->len;
>> +	struct pcpu_lstats *stats = this_cpu_ptr(dev->lstats);
>> +
>> +	u64_stats_update_begin(&stats->syncp);
>> +	stats->rx_bytes += len;
>> +	stats->rx_packets++;
>> +	u64_stats_update_end(&stats->syncp);
>> +
>> +	dev_kfree_skb(skb);
>> +
>> +	return NETDEV_TX_OK;
>> +}
>> +
>> +static struct rtnl_link_stats64 *
>> +vsockmon_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
>> +{
>> +	int i;
>> +	u64 bytes = 0, packets = 0;
>> +
>> +	for_each_possible_cpu(i) {
>> +		const struct pcpu_lstats *vstats;
>> +		u64 tbytes, tpackets;
>> +		unsigned int start;
>> +
>> +		vstats = per_cpu_ptr(dev->lstats, i);
>> +
>> +		do {
>> +			start = u64_stats_fetch_begin_irq(&vstats->syncp);
>> +			tbytes = vstats->rx_bytes;
>> +			tpackets = vstats->rx_packets;
>> +		} while (u64_stats_fetch_retry_irq(&vstats->syncp, start));
>> +
>> +		packets += tpackets;
>> +		bytes += tbytes;
>> +	}
>> +
>> +	stats->rx_packets = packets;
>> +	stats->tx_packets = 0;
>> +
>> +	stats->rx_bytes = bytes;
>> +	stats->tx_bytes = 0;
>> +
>> +	return stats;
>> +}
>> +
>> +static int vsockmon_is_valid_mtu(int new_mtu)
>> +{
>> +	return new_mtu >= (int) sizeof(struct af_vsockmon_hdr);
>> +}
>> +
>> +static int vsockmon_change_mtu(struct net_device *dev, int new_mtu)
>> +{
>> +	if (!vsockmon_is_valid_mtu(new_mtu))
>> +		return -EINVAL;
>> +
>> +	dev->mtu = new_mtu;
>> +	return 0;
>> +}
> I wonder if the mtu serves any purpose.  What happens when you change it
> from the default value?
It does not happen anything as we don't consider the mtu anywhere.
I think it is interesting to set it so monitoring programs can adjust 
their buffers and it may be interesting to be able to change it so if 
the transport changes the mtu (or if vmci support is added) it can be 
adjusted.
>> +
>> +static const struct net_device_ops vsockmon_ops = {
>> +	.ndo_init = vsockmon_dev_init,
>> +	.ndo_uninit = vsockmon_dev_uninit,
>> +	.ndo_open = vsockmon_open,
>> +	.ndo_stop = vsockmon_close,
>> +	.ndo_start_xmit = vsockmon_xmit,
>> +	.ndo_get_stats64 = vsockmon_get_stats64,
>> +	.ndo_change_mtu = vsockmon_change_mtu,
>> +};
>> +
>> +static u32 always_on(struct net_device *dev)
>> +{
>> +	return 1;
>> +}
>> +
>> +static const struct ethtool_ops vsockmon_ethtool_ops = {
>> +	.get_link = always_on,
>> +};
>> +
>> +static void vsockmon_setup(struct net_device *dev)
>> +{
>> +	dev->type = ARPHRD_VSOCKMON;
>> +	dev->priv_flags |= IFF_NO_QUEUE;
>> +
>> +	dev->netdev_ops	= &vsockmon_ops;
>> +	dev->ethtool_ops = &vsockmon_ethtool_ops;
>> +	dev->destructor	= free_netdev;
>> +
>> +	dev->features = NETIF_F_SG | NETIF_F_FRAGLIST |
>> +			NETIF_F_HIGHDMA | NETIF_F_LLTX;
>> +
>> +	dev->flags = IFF_NOARP;
>> +
>> +	dev->mtu = DEFAULT_MTU;
>> +}
>> +
>> +static struct rtnl_link_ops vsockmon_link_ops __read_mostly = {
>> +	.kind			= "vsockmon",
>> +	.priv_size		= sizeof(struct vsockmon),
>> +	.setup			= vsockmon_setup,
>> +};
>> +
>> +static __init int vsockmon_register(void)
>> +{
>> +	int ret = rtnl_link_register(&vsockmon_link_ops);
>> +	if (!ret)
>> +		vsock_init_tap();
>> +
>> +	return ret;
>> +}
>> +
>> +static __exit void vsockmon_unregister(void)
>> +{
>> +	rtnl_link_unregister(&vsockmon_link_ops);
>> +}
>> +
>> +module_init(vsockmon_register);
>> +module_exit(vsockmon_unregister);
>> +
>> +MODULE_LICENSE("GPL v2");
>> +MODULE_AUTHOR("Gerard Garcia <ggarcia@deic.uab.cat>");
>> +MODULE_DESCRIPTION("Vsock monitoring device");
>> +MODULE_ALIAS_RTNL_LINK("vsockmon");
> there should be some mention that this is derived from nlmon.c.
Ok, I'll add in the description that it is based on the nlmon code.
>> diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
>> index 5f047d2..b1836cc 100644
>> --- a/include/uapi/linux/Kbuild
>> +++ b/include/uapi/linux/Kbuild
>> @@ -454,6 +454,7 @@ header-y += virtio_scsi.h
>>   header-y += virtio_types.h
>>   header-y += virtio_vsock.h
>>   header-y += vm_sockets.h
>> +header-y += vsockmon.h
>>   header-y += vt.h
>>   header-y += wait.h
>>   header-y += wanrouter.h
>> diff --git a/include/uapi/linux/vsockmon.h b/include/uapi/linux/vsockmon.h
>> new file mode 100644
>> index 0000000..c73166f
>> --- /dev/null
>> +++ b/include/uapi/linux/vsockmon.h
>> @@ -0,0 +1,37 @@
>> +#ifndef _UAPI_VSOCKMON_H
>> +#define _UAPI_VSOCKMON_H
>> +
>> +#include <linux/virtio_vsock.h>
>> +
>> +/* Packet structure of packets received from the vsockmon device. */
>> +
>> +struct af_vsockmon_g {
>> +	unsigned short op; /* enum af_vsock_g_ops */
>> +	unsigned int src_cid;
>> +	unsigned int src_port;
>> +	unsigned int dst_cid;
>> +	unsigned int dst_port;
>> +};
>> +
>> +struct af_vsockmon_hdr {
>> +	unsigned short type;  /* enum af_vosck_type */
>> +	struct af_vsockmon_g g_hdr;
>> +	union {
>> +		struct virtio_vsock_hdr virtio_hdr;
>> +	} t_hdr;
>> +};
> How does endianness work?  virtio_hdr uses little-endian fields on the
> wire.  I guess that af_vsockmon_g is always CPU-endian.
Yes, af_vsockmon_g is CPU-endian. I don't know what criteria is normally 
used regarding the endianness of structs facing user space but I think 
it is better to not modify the vsock transport structs.
>> +
>> +enum af_vsockmon_type {
>> +	AF_VSOCK_GENERIC = 1, /* No transport header */
>> +	AF_VSOCK_VIRTIO = 2,  /* Addtional virtio transport header */
>> +};
>> +
>> +enum af_vsockmon_g_ops {
>> +	AF_VSOCK_G_OP_UNKNOWN = 0,
>> +	AF_VSOCK_G_OP_CONNECT = 1,
>> +	AF_VSOCK_G_OP_DISCONNECT = 2,
>> +	AF_VSOCK_G_OP_CONTROL = 3,
>> +	AF_VSOCK_G_OP_PAYLOAD = 4,
>> +};
>> +
>> +#endif
>> -- 
>> 2.8.3
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 3/3] vsockmon: Add vsock hooks
  2016-06-01 21:19   ` Stefan Hajnoczi
@ 2016-06-09 15:27     ` Gerard Garcia
  0 siblings, 0 replies; 13+ messages in thread
From: Gerard Garcia @ 2016-06-09 15:27 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: netdev, jhansen



On 06/01/2016 11:19 PM, Stefan Hajnoczi wrote:
> On Sat, May 28, 2016 at 06:29:07PM +0200, ggarcia@abra.uab.cat wrote:
>> From: Gerard Garcia <ggarcia@deic.uab.cat>
>>
>> Signed-off-by: Gerard Garcia <ggarcia@deic.uab.cat>
>> ---
>>   drivers/vhost/vsock.c | 71 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 71 insertions(+)
>>
>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> index 17bfe4e..8fd5125 100644
>> --- a/drivers/vhost/vsock.c
>> +++ b/drivers/vhost/vsock.c
>> @@ -14,8 +14,10 @@
>>   #include <net/sock.h>
>>   #include <linux/virtio_vsock.h>
>>   #include <linux/vhost.h>
>> +#include <linux/skbuff.h>
>>   
>>   #include <net/af_vsock.h>
>> +#include <uapi/linux/vsockmon.h>
>>   #include "vhost.h"
>>   
>>   #define VHOST_VSOCK_DEFAULT_HOST_CID	2
>> @@ -45,6 +47,67 @@ struct vhost_vsock {
>>   	u32 guest_cid;
>>   };
>>   
>> +static struct sk_buff *
>> +virtio_vsock_pkt_to_skb(struct virtio_vsock_pkt *pkt)
>> +{
>> +	struct sk_buff *skb;
>> +	struct af_vsockmon_hdr *hdr;
>> +	void *payload;
>> +
>> +	u32 skb_len = sizeof(struct af_vsockmon_hdr) + pkt->len;
>> +
>> +	skb = alloc_skb(skb_len, GFP_ATOMIC);
>> +	if (!skb)
>> +		return NULL;
>> +
>> +	skb_reserve(skb, sizeof(struct af_vsockmon_hdr));
>> +
>> +	if (pkt->len) {
>> +		payload = skb_put(skb, pkt->len);
>> +		memcpy(payload, pkt->buf, pkt->len);
>> +	}
>> +
>> +	hdr = (struct af_vsockmon_hdr *) skb_push(skb, sizeof(*hdr));
>> +	hdr->type = AF_VSOCK_VIRTIO;
>> +
>> +	switch(pkt->hdr.op) {
>> +		case VIRTIO_VSOCK_OP_REQUEST:
>> +		case VIRTIO_VSOCK_OP_RESPONSE:
>> +			hdr->g_hdr.op = AF_VSOCK_G_OP_CONNECT;
>> +			break;
>> +		case VIRTIO_VSOCK_OP_RST:
>> +		case VIRTIO_VSOCK_OP_SHUTDOWN:
>> +			hdr->g_hdr.op = AF_VSOCK_G_OP_DISCONNECT;
>> +			break;
>> +		case VIRTIO_VSOCK_OP_RW:
>> +			hdr->g_hdr.op = AF_VSOCK_G_OP_PAYLOAD;
>> +			break;
>> +		case VIRTIO_VSOCK_OP_CREDIT_UPDATE:
>> +		case VIRTIO_VSOCK_OP_CREDIT_REQUEST:
>> +			hdr->g_hdr.op = AF_VSOCK_G_OP_CONTROL;
>> +			break;
>> +		default:
>> +			hdr->g_hdr.op = AF_VSOCK_G_OP_UNKNOWN;
>> +			break;
>> +	}
>> +
>> +	hdr->g_hdr.src_cid = pkt->hdr.src_cid;
>> +	hdr->g_hdr.src_port = pkt->hdr.src_port;
>> +	hdr->g_hdr.dst_cid = pkt->hdr.dst_cid;
>> +	hdr->g_hdr.dst_port = pkt->hdr.dst_port;
> Careful with endianness here.  pkt->hdr uses little-endian fields.
True, I have not considered this.
>> +
>> +	hdr->t_hdr.virtio_hdr = pkt->hdr;
>> +
>> +	return skb;
>> +}
>> +
>> +static void vsock_deliver_tap_pkt(struct virtio_vsock_pkt *pkt)
>> +{
>> +	struct sk_buff *skb = virtio_vsock_pkt_to_skb(pkt);
>> +	if (skb)
>> +		vsock_deliver_tap(skb);
> vsock_deliver_tap() doesn't take ownership of skb so it is leaked here!
>
> In fact, given that the core vsock code is not skb-based, perhaps the
> tap code shouldn't clone at all.  Just take ownership of the skb.  This
> avoids the extra allocation and memcpy.
>
You are right! and yes, I agree that would be better to avoid the clone. 
As you say I can just take ownership of the skb before dev_queue_xmit() 
in vsock_deliver_tap() so it won't be free'd after the call and avoid 
the clone.
>> +}
>> +
>>   static u32 vhost_transport_get_local_cid(void)
>>   {
>>   	return VHOST_VSOCK_DEFAULT_HOST_CID;
>> @@ -147,6 +210,11 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
>>   
>>   		vsock->total_tx_buf -= pkt->len;
>>   
>> +		/* Deliver to monitoring devices all correctly transmitted
>> +		 * packets.
>> +		 */
>> +		vsock_deliver_tap_pkt(pkt);
>> +
>>   		virtio_transport_free_pkt(pkt);
>>   	}
>>   	if (added)
>> @@ -367,6 +435,9 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
>>   			continue;
>>   		}
>>   
>> +		/* Deliver to monitoring devices all received packets */
>> +		vsock_deliver_tap_pkt(pkt);
>> +
>>   		/* Only accept correctly addressed packets */
>>   		if (le64_to_cpu(pkt->hdr.src_cid) == vsock->guest_cid)
>>   			virtio_transport_recv_pkt(pkt);
>> -- 
>> 2.8.3
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 2/3] vsockmon: Add vsockmon device
  2016-06-09 15:21     ` Gerard Garcia
@ 2016-06-10 15:37       ` Stefan Hajnoczi
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2016-06-10 15:37 UTC (permalink / raw)
  To: Gerard Garcia; +Cc: netdev, jhansen

[-- Attachment #1: Type: text/plain, Size: 1323 bytes --]

On Thu, Jun 09, 2016 at 05:21:26PM +0200, Gerard Garcia wrote:
> > > diff --git a/include/uapi/linux/vsockmon.h b/include/uapi/linux/vsockmon.h
> > > new file mode 100644
> > > index 0000000..c73166f
> > > --- /dev/null
> > > +++ b/include/uapi/linux/vsockmon.h
> > > @@ -0,0 +1,37 @@
> > > +#ifndef _UAPI_VSOCKMON_H
> > > +#define _UAPI_VSOCKMON_H
> > > +
> > > +#include <linux/virtio_vsock.h>
> > > +
> > > +/* Packet structure of packets received from the vsockmon device. */
> > > +
> > > +struct af_vsockmon_g {
> > > +	unsigned short op; /* enum af_vsock_g_ops */
> > > +	unsigned int src_cid;
> > > +	unsigned int src_port;
> > > +	unsigned int dst_cid;
> > > +	unsigned int dst_port;
> > > +};
> > > +
> > > +struct af_vsockmon_hdr {
> > > +	unsigned short type;  /* enum af_vosck_type */
> > > +	struct af_vsockmon_g g_hdr;
> > > +	union {
> > > +		struct virtio_vsock_hdr virtio_hdr;
> > > +	} t_hdr;
> > > +};
> > How does endianness work?  virtio_hdr uses little-endian fields on the
> > wire.  I guess that af_vsockmon_g is always CPU-endian.
> Yes, af_vsockmon_g is CPU-endian. I don't know what criteria is normally
> used regarding the endianness of structs facing user space but I think it is
> better to not modify the vsock transport structs.

Endianness must to be documented in this uapi header file.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 1/3] vsockmon: Add tap functions.
  2016-06-09 15:02     ` Gerard Garcia
@ 2016-06-10 15:44       ` Stefan Hajnoczi
  2016-06-14 12:05         ` Jorgen S. Hansen
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2016-06-10 15:44 UTC (permalink / raw)
  To: Gerard Garcia; +Cc: netdev, jhansen

[-- Attachment #1: Type: text/plain, Size: 2390 bytes --]

On Thu, Jun 09, 2016 at 05:02:47PM +0200, Gerard Garcia wrote:
> On 06/01/2016 11:07 PM, Stefan Hajnoczi wrote:
> > On Sat, May 28, 2016 at 06:29:05PM +0200, ggarcia@abra.uab.cat wrote:
> > > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > > index 6b158ab..ec7a05d 100644
> > > --- a/net/vmw_vsock/af_vsock.c
> > > +++ b/net/vmw_vsock/af_vsock.c
> > > @@ -96,6 +96,7 @@
> > >   #include <linux/unistd.h>
> > >   #include <linux/wait.h>
> > >   #include <linux/workqueue.h>
> > > +#include <linux/if_arp.h>
> > >   #include <net/sock.h>
> > >   #include <net/af_vsock.h>
> > > @@ -2012,6 +2013,110 @@ const struct vsock_transport *vsock_core_get_transport(void)
> > >   }
> > >   EXPORT_SYMBOL_GPL(vsock_core_get_transport);
> > > +/**** TAP ****/
> > Feel free to put this in a separate source file.  The Kbuild can link
> > multiple objects into a single kernel module.  That would be cleaner
> > than using a big comment to separate it from af_vsock.c code.
> I'm following the af_vsock.c style, where different logic is separated using
> this style of comments. It is not a lot of code
> so I thought it would be cleaner to have it in the same file.

It's up to the af_vsock.c maintainer, but if we keep appending
independent chunks of code to one file it becomes hard to manage and
chances of conflicts during patch merging increases.

> > > +int vsock_add_tap(struct vsock_tap *vt) {
> > > +	if (unlikely(vt->dev->type != ARPHRD_VSOCKMON))
> > > +		return -EINVAL;
> > > +
> > > +	spin_lock(&vsock_tap_lock);
> > > +	list_add_rcu(&vt->list, &vsock_tap_all);
> > > +	spin_unlock(&vsock_tap_lock);
> > > +
> > > +	__module_get(vt->module);
> > It's slightly safer to get the module before publishing it on the list.
> > But in practice I guess the caller is the module so the module won't
> > disappear underneath us.
> This function is equal to the function in af_netlink.c used by nlmon. As you
> said, in practice the caller is the module
> so it won't disappear.

Yes, there isn't a huge win right now but given that it's easy to
resolve the issue I'd do it.  The problem comes when people copy-paste
code that contains assumptions and the assumption no longer holds.
Better to write it in the safe way, eliminating the assumption, so that
derived code will be correct under more conditions.  There is no
drawback to moving __module_get() above the spin_lock().

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 1/3] vsockmon: Add tap functions.
  2016-06-10 15:44       ` Stefan Hajnoczi
@ 2016-06-14 12:05         ` Jorgen S. Hansen
  0 siblings, 0 replies; 13+ messages in thread
From: Jorgen S. Hansen @ 2016-06-14 12:05 UTC (permalink / raw)
  To: Stefan Hajnoczi, Gerard Garcia; +Cc: netdev

On 6/10/16, 5:44 PM, "Stefan Hajnoczi" <stefanha@redhat.com> wrote:



>On Thu, Jun 09, 2016 at 05:02:47PM +0200, Gerard Garcia wrote:
>> On 06/01/2016 11:07 PM, Stefan Hajnoczi wrote:
>> > On Sat, May 28, 2016 at 06:29:05PM +0200, ggarcia@abra.uab.cat wrote:
>> > > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> > > index 6b158ab..ec7a05d 100644
>> > > --- a/net/vmw_vsock/af_vsock.c
>> > > +++ b/net/vmw_vsock/af_vsock.c
>> > > @@ -96,6 +96,7 @@
>> > >   #include <linux/unistd.h>
>> > >   #include <linux/wait.h>
>> > >   #include <linux/workqueue.h>
>> > > +#include <linux/if_arp.h>
>> > >   #include <net/sock.h>
>> > >   #include <net/af_vsock.h>
>> > > @@ -2012,6 +2013,110 @@ const struct vsock_transport *vsock_core_get_transport(void)
>> > >   }
>> > >   EXPORT_SYMBOL_GPL(vsock_core_get_transport);
>> > > +/**** TAP ****/
>> > Feel free to put this in a separate source file.  The Kbuild can link
>> > multiple objects into a single kernel module.  That would be cleaner
>> > than using a big comment to separate it from af_vsock.c code.
>> I'm following the af_vsock.c style, where different logic is separated using
>> this style of comments. It is not a lot of code
>> so I thought it would be cleaner to have it in the same file.
>
>It's up to the af_vsock.c maintainer, but if we keep appending
>independent chunks of code to one file it becomes hard to manage and
>chances of conflicts during patch merging increases.

I agree with Stefan - af_vsock.c is already undesirably large, so it would be great if you could make this a separate file.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-06-14 12:05 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-28 16:29 [RFC 0/3] vsockmon: virtual device to monitor AF_VSOCK sockets ggarcia
2016-05-28 16:29 ` [RFC 1/3] vsockmon: Add tap functions ggarcia
2016-06-01 21:07   ` Stefan Hajnoczi
2016-06-09 15:02     ` Gerard Garcia
2016-06-10 15:44       ` Stefan Hajnoczi
2016-06-14 12:05         ` Jorgen S. Hansen
2016-05-28 16:29 ` [RFC 2/3] vsockmon: Add vsockmon device ggarcia
2016-06-01 21:15   ` Stefan Hajnoczi
2016-06-09 15:21     ` Gerard Garcia
2016-06-10 15:37       ` Stefan Hajnoczi
2016-05-28 16:29 ` [RFC 3/3] vsockmon: Add vsock hooks ggarcia
2016-06-01 21:19   ` Stefan Hajnoczi
2016-06-09 15:27     ` Gerard Garcia

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.