All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput
@ 2019-05-10 12:58 Stefano Garzarella
  2019-05-10 12:58 ` [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket Stefano Garzarella
                   ` (17 more replies)
  0 siblings, 18 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-10 12:58 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Stefan Hajnoczi, Jason Wang

While I was testing this new series (v2) I discovered an huge use of memory
and a memory leak in the virtio-vsock driver in the guest when I sent
1-byte packets to the guest.

These issues are present since the introduction of the virtio-vsock
driver. I added the patches 1 and 2 to fix them in this series in order
to better track the performance trends.

v1: https://patchwork.kernel.org/cover/10885431/

v2:
- Add patch 1 to limit the memory usage
- Add patch 2 to avoid memory leak during the socket release
- Add patch 3 to fix locking of fwd_cnt and buf_alloc
- Patch 4: fix 'free_space' type (u32 instead of s64) [Stefan]
- Patch 5: Avoid integer underflow of iov_len [Stefan]
- Patch 5: Fix packet capture in order to see the exact packets that are
           delivered. [Stefan]
- Add patch 8 to make the RX buffer size tunable [Stefan]

Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
support.
As Micheal suggested in the v1, I booted host and guest with 'nosmap', and I
added a column with virtio-net+vhost-net performance.

A brief description of patches:
- Patches 1+2: limit the memory usage with an extra copy and avoid memory leak
- Patches 3+4: fix locking and reduce the number of credit update messages sent
               to the transmitter
- Patches 5+6: allow the host to split packets on multiple buffers and use
               VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
- Patches 7+8: increase RX buffer size to 64 KiB

                    host -> guest [Gbps]
pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
                                                                     TCP_NODELAY
64         0.068     0.063    0.130    0.131    0.128         0.188     0.187
256        0.274     0.236    0.392    0.338    0.282         0.749     0.654
512        0.531     0.457    0.862    0.725    0.602         1.419     1.414
1K         0.954     0.827    1.591    1.598    1.548         2.599     2.640
2K         1.783     1.543    3.731    3.637    3.469         4.530     4.754
4K         3.332     3.436    7.164    7.124    6.494         7.738     7.696
8K         5.792     5.530   11.653   11.787   11.444        12.307    11.850
16K        8.405     8.462   16.372   16.855   17.562        16.936    16.954
32K       14.208    13.669   18.945   20.009   23.128        21.980    23.015
64K       21.082    18.893   20.266   20.903   30.622        27.290    27.383
128K      20.696    20.148   20.112   21.746   32.152        30.446    30.990
256K      20.801    20.589   20.725   22.685   34.721        33.151    32.745
512K      21.220    20.465   20.432   22.106   34.496        36.847    31.096

                    guest -> host [Gbps]
pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
                                                                     TCP_NODELAY
64         0.089     0.091    0.120    0.115    0.117         0.274     0.272
256        0.352     0.354    0.452    0.445    0.451         1.085     1.136
512        0.705     0.704    0.893    0.858    0.898         2.131     1.882
1K         1.394     1.433    1.721    1.669    1.691         3.984     3.576
2K         2.818     2.874    3.316    3.249    3.303         6.719     6.359
4K         5.293     5.397    6.129    5.933    6.082        10.105     9.860
8K         8.890     9.151   10.990   10.545   10.519        15.239    14.868
16K       11.444    11.018   12.074   15.255   15.577        20.551    20.848
32K       11.229    10.875   10.857   24.401   25.227        26.294    26.380
64K       10.832    10.545   10.816   39.487   39.616        34.996    32.041
128K      10.435    10.241   10.500   39.813   40.012        38.379    35.055
256K      10.263     9.866    9.845   34.971   35.143        36.559    37.232
512K      10.224    10.060   10.092   35.469   34.627        34.963    33.401

As Stefan suggested in the v1, this time I measured also the efficiency in this
way:
    efficiency = Mbps / (%CPU_Host + %CPU_Guest)

The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
but it's provided for free from iperf3 and could be an indication.

        host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
                                                                     TCP_NODELAY
64          0.94      0.59     3.96     4.06     4.09          2.82      2.11
256         2.62      2.50     6.45     6.09     5.81          9.64      8.73
512         5.16      4.87    13.16    12.39    11.67         17.83     17.76
1K          9.16      8.85    24.98    24.97    25.01         32.57     32.04
2K         17.41     17.03    49.09    48.59    49.22         55.31     57.14
4K         32.99     33.62    90.80    90.98    91.72         91.79     91.40
8K         58.51     59.98   153.53   170.83   167.31        137.51    132.85
16K        89.32     95.29   216.98   264.18   260.95        176.05    176.05
32K       152.94    167.10   285.75   387.02   360.81        215.49    226.30
64K       250.38    307.20   317.65   489.53   472.70        238.97    244.27
128K      327.99    335.24   335.76   523.71   486.41        253.29    260.86
256K      327.06    334.24   338.64   533.76   509.85        267.78    266.22
512K      337.36    330.61   334.95   512.90   496.35        280.42    241.43

        guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
                                                                     TCP_NODELAY
64          0.90      0.91     1.37     1.32     1.35          2.15      2.13
256         3.59      3.55     5.23     5.19     5.29          8.50      8.89
512         7.19      7.08    10.21     9.95    10.38         16.74     14.71
1K         14.15     14.34    19.85    19.06    19.33         31.44     28.11
2K         28.44     29.09    37.78    37.18    37.49         53.07     50.63
4K         55.37     57.60    71.02    69.27    70.97         81.56     79.32
8K        105.58    100.45   111.95   124.68   123.61        120.85    118.66
16K       141.63    138.24   137.67   187.41   190.20        160.43    163.00
32K       147.56    143.09   138.48   296.41   301.04        214.64    223.94
64K       144.81    143.27   138.49   433.98   462.26        298.86    269.71
128K      150.14    147.99   146.85   511.36   514.29        350.17    298.09
256K      156.69    152.25   148.69   542.19   549.97        326.42    333.32
512K      157.29    153.35   152.22   546.52   533.24        315.55    302.27

[1] https://github.com/stefano-garzarella/iperf/

Stefano Garzarella (8):
  vsock/virtio: limit the memory used per-socket
  vsock/virtio: free packets during the socket release
  vsock/virtio: fix locking for fwd_cnt and buf_alloc
  vsock/virtio: reduce credit update messages
  vhost/vsock: split packets to send using multiple buffers
  vsock/virtio: change the maximum packet size allowed
  vsock/virtio: increase RX buffer size to 64 KiB
  vsock/virtio: make the RX buffer size tunable

 drivers/vhost/vsock.c                   |  53 +++++++--
 include/linux/virtio_vsock.h            |  14 ++-
 net/vmw_vsock/virtio_transport.c        |  28 ++++-
 net/vmw_vsock/virtio_transport_common.c | 144 ++++++++++++++++++------
 4 files changed, 190 insertions(+), 49 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket
  2019-05-10 12:58 [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
  2019-05-10 12:58 ` [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket Stefano Garzarella
@ 2019-05-10 12:58 ` Stefano Garzarella
  2019-05-12 16:57     ` Michael S. Tsirkin
                     ` (4 more replies)
  2019-05-10 12:58 ` [PATCH v2 2/8] vsock/virtio: free packets during the socket release Stefano Garzarella
                   ` (15 subsequent siblings)
  17 siblings, 5 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-10 12:58 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Stefan Hajnoczi, Jason Wang

Since virtio-vsock was introduced, the buffers filled by the host
and pushed to the guest using the vring, are directly queued in
a per-socket list avoiding to copy it.
These buffers are preallocated by the guest with a fixed
size (4 KB).

The maximum amount of memory used by each socket should be
controlled by the credit mechanism.
The default credit available per-socket is 256 KB, but if we use
only 1 byte per packet, the guest can queue up to 262144 of 4 KB
buffers, using up to 1 GB of memory per-socket. In addition, the
guest will continue to fill the vring with new 4 KB free buffers
to avoid starvation of other sockets.

This patch solves this issue copying the payload in a new buffer.
Then it is queued in the per-socket list, and the 4KB buffer used
by the host is freed.

In this way, the memory used by each socket respects the credit
available, and we still avoid starvation, paying the cost of an
extra memory copy. When the buffer is completely full we do a
"zero-copy", moving the buffer directly in the per-socket list.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 drivers/vhost/vsock.c                   |  2 +
 include/linux/virtio_vsock.h            |  8 +++
 net/vmw_vsock/virtio_transport.c        |  1 +
 net/vmw_vsock/virtio_transport_common.c | 95 ++++++++++++++++++-------
 4 files changed, 81 insertions(+), 25 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index bb5fc0e9fbc2..7964e2daee09 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -320,6 +320,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
 		return NULL;
 	}
 
+	pkt->buf_len = pkt->len;
+
 	nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter);
 	if (nbytes != pkt->len) {
 		vq_err(vq, "Expected %u byte payload, got %zu bytes\n",
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index e223e2632edd..345f04ee9193 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -54,9 +54,17 @@ struct virtio_vsock_pkt {
 	void *buf;
 	u32 len;
 	u32 off;
+	u32 buf_len;
 	bool reply;
 };
 
+struct virtio_vsock_buf {
+	struct list_head list;
+	void *addr;
+	u32 len;
+	u32 off;
+};
+
 struct virtio_vsock_pkt_info {
 	u32 remote_cid, remote_port;
 	struct vsock_sock *vsk;
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 15eb5d3d4750..af1d2ce12f54 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -280,6 +280,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
 			break;
 		}
 
+		pkt->buf_len = buf_len;
 		pkt->len = buf_len;
 
 		sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 602715fc9a75..0248d6808755 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -65,6 +65,9 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
 		pkt->buf = kmalloc(len, GFP_KERNEL);
 		if (!pkt->buf)
 			goto out_pkt;
+
+		pkt->buf_len = len;
+
 		err = memcpy_from_msg(pkt->buf, info->msg, len);
 		if (err)
 			goto out;
@@ -86,6 +89,46 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
 	return NULL;
 }
 
+static struct virtio_vsock_buf *
+virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
+{
+	struct virtio_vsock_buf *buf;
+
+	if (pkt->len == 0)
+		return NULL;
+
+	buf = kzalloc(sizeof(*buf), GFP_KERNEL);
+	if (!buf)
+		return NULL;
+
+	/* If the buffer in the virtio_vsock_pkt is full, we can move it to
+	 * the new virtio_vsock_buf avoiding the copy, because we are sure that
+	 * we are not use more memory than that counted by the credit mechanism.
+	 */
+	if (zero_copy && pkt->len == pkt->buf_len) {
+		buf->addr = pkt->buf;
+		pkt->buf = NULL;
+	} else {
+		buf->addr = kmalloc(pkt->len, GFP_KERNEL);
+		if (!buf->addr) {
+			kfree(buf);
+			return NULL;
+		}
+
+		memcpy(buf->addr, pkt->buf, pkt->len);
+	}
+
+	buf->len = pkt->len;
+
+	return buf;
+}
+
+static void virtio_transport_free_buf(struct virtio_vsock_buf *buf)
+{
+	kfree(buf->addr);
+	kfree(buf);
+}
+
 /* Packet capture */
 static struct sk_buff *virtio_transport_build_skb(void *opaque)
 {
@@ -190,17 +233,15 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 	return virtio_transport_get_ops()->send_pkt(pkt);
 }
 
-static void virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
-					struct virtio_vsock_pkt *pkt)
+static void virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs, u32 len)
 {
-	vvs->rx_bytes += pkt->len;
+	vvs->rx_bytes += len;
 }
 
-static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
-					struct virtio_vsock_pkt *pkt)
+static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs, u32 len)
 {
-	vvs->rx_bytes -= pkt->len;
-	vvs->fwd_cnt += pkt->len;
+	vvs->rx_bytes -= len;
+	vvs->fwd_cnt += len;
 }
 
 void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
@@ -254,36 +295,36 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 				   size_t len)
 {
 	struct virtio_vsock_sock *vvs = vsk->trans;
-	struct virtio_vsock_pkt *pkt;
+	struct virtio_vsock_buf *buf;
 	size_t bytes, total = 0;
 	int err = -EFAULT;
 
 	spin_lock_bh(&vvs->rx_lock);
 	while (total < len && !list_empty(&vvs->rx_queue)) {
-		pkt = list_first_entry(&vvs->rx_queue,
-				       struct virtio_vsock_pkt, list);
+		buf = list_first_entry(&vvs->rx_queue,
+				       struct virtio_vsock_buf, list);
 
 		bytes = len - total;
-		if (bytes > pkt->len - pkt->off)
-			bytes = pkt->len - pkt->off;
+		if (bytes > buf->len - buf->off)
+			bytes = buf->len - buf->off;
 
 		/* sk_lock is held by caller so no one else can dequeue.
 		 * Unlock rx_lock since memcpy_to_msg() may sleep.
 		 */
 		spin_unlock_bh(&vvs->rx_lock);
 
-		err = memcpy_to_msg(msg, pkt->buf + pkt->off, bytes);
+		err = memcpy_to_msg(msg, buf->addr + buf->off, bytes);
 		if (err)
 			goto out;
 
 		spin_lock_bh(&vvs->rx_lock);
 
 		total += bytes;
-		pkt->off += bytes;
-		if (pkt->off == pkt->len) {
-			virtio_transport_dec_rx_pkt(vvs, pkt);
-			list_del(&pkt->list);
-			virtio_transport_free_pkt(pkt);
+		buf->off += bytes;
+		if (buf->off == buf->len) {
+			virtio_transport_dec_rx_pkt(vvs, buf->len);
+			list_del(&buf->list);
+			virtio_transport_free_buf(buf);
 		}
 	}
 	spin_unlock_bh(&vvs->rx_lock);
@@ -841,20 +882,24 @@ virtio_transport_recv_connected(struct sock *sk,
 {
 	struct vsock_sock *vsk = vsock_sk(sk);
 	struct virtio_vsock_sock *vvs = vsk->trans;
+	struct virtio_vsock_buf *buf;
 	int err = 0;
 
 	switch (le16_to_cpu(pkt->hdr.op)) {
 	case VIRTIO_VSOCK_OP_RW:
 		pkt->len = le32_to_cpu(pkt->hdr.len);
-		pkt->off = 0;
+		buf = virtio_transport_alloc_buf(pkt, true);
 
-		spin_lock_bh(&vvs->rx_lock);
-		virtio_transport_inc_rx_pkt(vvs, pkt);
-		list_add_tail(&pkt->list, &vvs->rx_queue);
-		spin_unlock_bh(&vvs->rx_lock);
+		if (buf) {
+			spin_lock_bh(&vvs->rx_lock);
+			virtio_transport_inc_rx_pkt(vvs, pkt->len);
+			list_add_tail(&buf->list, &vvs->rx_queue);
+			spin_unlock_bh(&vvs->rx_lock);
 
-		sk->sk_data_ready(sk);
-		return err;
+			sk->sk_data_ready(sk);
+		}
+
+		break;
 	case VIRTIO_VSOCK_OP_CREDIT_UPDATE:
 		sk->sk_write_space(sk);
 		break;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket
  2019-05-10 12:58 [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
@ 2019-05-10 12:58 ` Stefano Garzarella
  2019-05-10 12:58 ` Stefano Garzarella
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-10 12:58 UTC (permalink / raw)
  To: netdev
  Cc: kvm, Michael S. Tsirkin, linux-kernel, virtualization,
	Stefan Hajnoczi, David S. Miller

Since virtio-vsock was introduced, the buffers filled by the host
and pushed to the guest using the vring, are directly queued in
a per-socket list avoiding to copy it.
These buffers are preallocated by the guest with a fixed
size (4 KB).

The maximum amount of memory used by each socket should be
controlled by the credit mechanism.
The default credit available per-socket is 256 KB, but if we use
only 1 byte per packet, the guest can queue up to 262144 of 4 KB
buffers, using up to 1 GB of memory per-socket. In addition, the
guest will continue to fill the vring with new 4 KB free buffers
to avoid starvation of other sockets.

This patch solves this issue copying the payload in a new buffer.
Then it is queued in the per-socket list, and the 4KB buffer used
by the host is freed.

In this way, the memory used by each socket respects the credit
available, and we still avoid starvation, paying the cost of an
extra memory copy. When the buffer is completely full we do a
"zero-copy", moving the buffer directly in the per-socket list.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 drivers/vhost/vsock.c                   |  2 +
 include/linux/virtio_vsock.h            |  8 +++
 net/vmw_vsock/virtio_transport.c        |  1 +
 net/vmw_vsock/virtio_transport_common.c | 95 ++++++++++++++++++-------
 4 files changed, 81 insertions(+), 25 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index bb5fc0e9fbc2..7964e2daee09 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -320,6 +320,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
 		return NULL;
 	}
 
+	pkt->buf_len = pkt->len;
+
 	nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter);
 	if (nbytes != pkt->len) {
 		vq_err(vq, "Expected %u byte payload, got %zu bytes\n",
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index e223e2632edd..345f04ee9193 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -54,9 +54,17 @@ struct virtio_vsock_pkt {
 	void *buf;
 	u32 len;
 	u32 off;
+	u32 buf_len;
 	bool reply;
 };
 
+struct virtio_vsock_buf {
+	struct list_head list;
+	void *addr;
+	u32 len;
+	u32 off;
+};
+
 struct virtio_vsock_pkt_info {
 	u32 remote_cid, remote_port;
 	struct vsock_sock *vsk;
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 15eb5d3d4750..af1d2ce12f54 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -280,6 +280,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
 			break;
 		}
 
+		pkt->buf_len = buf_len;
 		pkt->len = buf_len;
 
 		sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 602715fc9a75..0248d6808755 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -65,6 +65,9 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
 		pkt->buf = kmalloc(len, GFP_KERNEL);
 		if (!pkt->buf)
 			goto out_pkt;
+
+		pkt->buf_len = len;
+
 		err = memcpy_from_msg(pkt->buf, info->msg, len);
 		if (err)
 			goto out;
@@ -86,6 +89,46 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
 	return NULL;
 }
 
+static struct virtio_vsock_buf *
+virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
+{
+	struct virtio_vsock_buf *buf;
+
+	if (pkt->len == 0)
+		return NULL;
+
+	buf = kzalloc(sizeof(*buf), GFP_KERNEL);
+	if (!buf)
+		return NULL;
+
+	/* If the buffer in the virtio_vsock_pkt is full, we can move it to
+	 * the new virtio_vsock_buf avoiding the copy, because we are sure that
+	 * we are not use more memory than that counted by the credit mechanism.
+	 */
+	if (zero_copy && pkt->len == pkt->buf_len) {
+		buf->addr = pkt->buf;
+		pkt->buf = NULL;
+	} else {
+		buf->addr = kmalloc(pkt->len, GFP_KERNEL);
+		if (!buf->addr) {
+			kfree(buf);
+			return NULL;
+		}
+
+		memcpy(buf->addr, pkt->buf, pkt->len);
+	}
+
+	buf->len = pkt->len;
+
+	return buf;
+}
+
+static void virtio_transport_free_buf(struct virtio_vsock_buf *buf)
+{
+	kfree(buf->addr);
+	kfree(buf);
+}
+
 /* Packet capture */
 static struct sk_buff *virtio_transport_build_skb(void *opaque)
 {
@@ -190,17 +233,15 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 	return virtio_transport_get_ops()->send_pkt(pkt);
 }
 
-static void virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
-					struct virtio_vsock_pkt *pkt)
+static void virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs, u32 len)
 {
-	vvs->rx_bytes += pkt->len;
+	vvs->rx_bytes += len;
 }
 
-static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
-					struct virtio_vsock_pkt *pkt)
+static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs, u32 len)
 {
-	vvs->rx_bytes -= pkt->len;
-	vvs->fwd_cnt += pkt->len;
+	vvs->rx_bytes -= len;
+	vvs->fwd_cnt += len;
 }
 
 void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
@@ -254,36 +295,36 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 				   size_t len)
 {
 	struct virtio_vsock_sock *vvs = vsk->trans;
-	struct virtio_vsock_pkt *pkt;
+	struct virtio_vsock_buf *buf;
 	size_t bytes, total = 0;
 	int err = -EFAULT;
 
 	spin_lock_bh(&vvs->rx_lock);
 	while (total < len && !list_empty(&vvs->rx_queue)) {
-		pkt = list_first_entry(&vvs->rx_queue,
-				       struct virtio_vsock_pkt, list);
+		buf = list_first_entry(&vvs->rx_queue,
+				       struct virtio_vsock_buf, list);
 
 		bytes = len - total;
-		if (bytes > pkt->len - pkt->off)
-			bytes = pkt->len - pkt->off;
+		if (bytes > buf->len - buf->off)
+			bytes = buf->len - buf->off;
 
 		/* sk_lock is held by caller so no one else can dequeue.
 		 * Unlock rx_lock since memcpy_to_msg() may sleep.
 		 */
 		spin_unlock_bh(&vvs->rx_lock);
 
-		err = memcpy_to_msg(msg, pkt->buf + pkt->off, bytes);
+		err = memcpy_to_msg(msg, buf->addr + buf->off, bytes);
 		if (err)
 			goto out;
 
 		spin_lock_bh(&vvs->rx_lock);
 
 		total += bytes;
-		pkt->off += bytes;
-		if (pkt->off == pkt->len) {
-			virtio_transport_dec_rx_pkt(vvs, pkt);
-			list_del(&pkt->list);
-			virtio_transport_free_pkt(pkt);
+		buf->off += bytes;
+		if (buf->off == buf->len) {
+			virtio_transport_dec_rx_pkt(vvs, buf->len);
+			list_del(&buf->list);
+			virtio_transport_free_buf(buf);
 		}
 	}
 	spin_unlock_bh(&vvs->rx_lock);
@@ -841,20 +882,24 @@ virtio_transport_recv_connected(struct sock *sk,
 {
 	struct vsock_sock *vsk = vsock_sk(sk);
 	struct virtio_vsock_sock *vvs = vsk->trans;
+	struct virtio_vsock_buf *buf;
 	int err = 0;
 
 	switch (le16_to_cpu(pkt->hdr.op)) {
 	case VIRTIO_VSOCK_OP_RW:
 		pkt->len = le32_to_cpu(pkt->hdr.len);
-		pkt->off = 0;
+		buf = virtio_transport_alloc_buf(pkt, true);
 
-		spin_lock_bh(&vvs->rx_lock);
-		virtio_transport_inc_rx_pkt(vvs, pkt);
-		list_add_tail(&pkt->list, &vvs->rx_queue);
-		spin_unlock_bh(&vvs->rx_lock);
+		if (buf) {
+			spin_lock_bh(&vvs->rx_lock);
+			virtio_transport_inc_rx_pkt(vvs, pkt->len);
+			list_add_tail(&buf->list, &vvs->rx_queue);
+			spin_unlock_bh(&vvs->rx_lock);
 
-		sk->sk_data_ready(sk);
-		return err;
+			sk->sk_data_ready(sk);
+		}
+
+		break;
 	case VIRTIO_VSOCK_OP_CREDIT_UPDATE:
 		sk->sk_write_space(sk);
 		break;
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 2/8] vsock/virtio: free packets during the socket release
  2019-05-10 12:58 [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
  2019-05-10 12:58 ` [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket Stefano Garzarella
  2019-05-10 12:58 ` Stefano Garzarella
@ 2019-05-10 12:58 ` Stefano Garzarella
  2019-05-10 22:20   ` David Miller
                     ` (3 more replies)
  2019-05-10 12:58 ` Stefano Garzarella
                   ` (14 subsequent siblings)
  17 siblings, 4 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-10 12:58 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Stefan Hajnoczi, Jason Wang

When the socket is released, we should free all packets
queued in the per-socket list in order to avoid a memory
leak.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 net/vmw_vsock/virtio_transport_common.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 0248d6808755..65c8b4a23f2b 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -827,12 +827,20 @@ static bool virtio_transport_close(struct vsock_sock *vsk)
 
 void virtio_transport_release(struct vsock_sock *vsk)
 {
+	struct virtio_vsock_sock *vvs = vsk->trans;
+	struct virtio_vsock_buf *buf;
 	struct sock *sk = &vsk->sk;
 	bool remove_sock = true;
 
 	lock_sock(sk);
 	if (sk->sk_type == SOCK_STREAM)
 		remove_sock = virtio_transport_close(vsk);
+	while (!list_empty(&vvs->rx_queue)) {
+		buf = list_first_entry(&vvs->rx_queue,
+				       struct virtio_vsock_buf, list);
+		list_del(&buf->list);
+		virtio_transport_free_buf(buf);
+	}
 	release_sock(sk);
 
 	if (remove_sock)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 2/8] vsock/virtio: free packets during the socket release
  2019-05-10 12:58 [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
                   ` (2 preceding siblings ...)
  2019-05-10 12:58 ` [PATCH v2 2/8] vsock/virtio: free packets during the socket release Stefano Garzarella
@ 2019-05-10 12:58 ` Stefano Garzarella
  2019-05-10 12:58 ` [PATCH v2 3/8] vsock/virtio: fix locking for fwd_cnt and buf_alloc Stefano Garzarella
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-10 12:58 UTC (permalink / raw)
  To: netdev
  Cc: kvm, Michael S. Tsirkin, linux-kernel, virtualization,
	Stefan Hajnoczi, David S. Miller

When the socket is released, we should free all packets
queued in the per-socket list in order to avoid a memory
leak.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 net/vmw_vsock/virtio_transport_common.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 0248d6808755..65c8b4a23f2b 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -827,12 +827,20 @@ static bool virtio_transport_close(struct vsock_sock *vsk)
 
 void virtio_transport_release(struct vsock_sock *vsk)
 {
+	struct virtio_vsock_sock *vvs = vsk->trans;
+	struct virtio_vsock_buf *buf;
 	struct sock *sk = &vsk->sk;
 	bool remove_sock = true;
 
 	lock_sock(sk);
 	if (sk->sk_type == SOCK_STREAM)
 		remove_sock = virtio_transport_close(vsk);
+	while (!list_empty(&vvs->rx_queue)) {
+		buf = list_first_entry(&vvs->rx_queue,
+				       struct virtio_vsock_buf, list);
+		list_del(&buf->list);
+		virtio_transport_free_buf(buf);
+	}
 	release_sock(sk);
 
 	if (remove_sock)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 3/8] vsock/virtio: fix locking for fwd_cnt and buf_alloc
  2019-05-10 12:58 [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
                   ` (3 preceding siblings ...)
  2019-05-10 12:58 ` Stefano Garzarella
@ 2019-05-10 12:58 ` Stefano Garzarella
  2019-05-10 12:58 ` Stefano Garzarella
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-10 12:58 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Stefan Hajnoczi, Jason Wang

fwd_cnt is written with rx_lock, so we should read it using
the same spinlock also if we are in the TX path.

Move also buf_alloc under rx_lock and add a missing locking
when we modify it.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 include/linux/virtio_vsock.h            | 2 +-
 net/vmw_vsock/virtio_transport_common.c | 6 ++++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 345f04ee9193..fb5954fc85c8 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -35,11 +35,11 @@ struct virtio_vsock_sock {
 
 	/* Protected by tx_lock */
 	u32 tx_cnt;
-	u32 buf_alloc;
 	u32 peer_fwd_cnt;
 	u32 peer_buf_alloc;
 
 	/* Protected by rx_lock */
+	u32 buf_alloc;
 	u32 fwd_cnt;
 	u32 rx_bytes;
 	struct list_head rx_queue;
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 65c8b4a23f2b..f2e4e128bc86 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -246,10 +246,10 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs, u32 len)
 
 void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
 {
-	spin_lock_bh(&vvs->tx_lock);
+	spin_lock_bh(&vvs->rx_lock);
 	pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
 	pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
-	spin_unlock_bh(&vvs->tx_lock);
+	spin_unlock_bh(&vvs->rx_lock);
 }
 EXPORT_SYMBOL_GPL(virtio_transport_inc_tx_pkt);
 
@@ -469,7 +469,9 @@ void virtio_transport_set_buffer_size(struct vsock_sock *vsk, u64 val)
 	if (val > vvs->buf_size_max)
 		vvs->buf_size_max = val;
 	vvs->buf_size = val;
+	spin_lock_bh(&vvs->rx_lock);
 	vvs->buf_alloc = val;
+	spin_unlock_bh(&vvs->rx_lock);
 }
 EXPORT_SYMBOL_GPL(virtio_transport_set_buffer_size);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 3/8] vsock/virtio: fix locking for fwd_cnt and buf_alloc
  2019-05-10 12:58 [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
                   ` (4 preceding siblings ...)
  2019-05-10 12:58 ` [PATCH v2 3/8] vsock/virtio: fix locking for fwd_cnt and buf_alloc Stefano Garzarella
@ 2019-05-10 12:58 ` Stefano Garzarella
  2019-05-10 12:58 ` [PATCH v2 4/8] vsock/virtio: reduce credit update messages Stefano Garzarella
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-10 12:58 UTC (permalink / raw)
  To: netdev
  Cc: kvm, Michael S. Tsirkin, linux-kernel, virtualization,
	Stefan Hajnoczi, David S. Miller

fwd_cnt is written with rx_lock, so we should read it using
the same spinlock also if we are in the TX path.

Move also buf_alloc under rx_lock and add a missing locking
when we modify it.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 include/linux/virtio_vsock.h            | 2 +-
 net/vmw_vsock/virtio_transport_common.c | 6 ++++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 345f04ee9193..fb5954fc85c8 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -35,11 +35,11 @@ struct virtio_vsock_sock {
 
 	/* Protected by tx_lock */
 	u32 tx_cnt;
-	u32 buf_alloc;
 	u32 peer_fwd_cnt;
 	u32 peer_buf_alloc;
 
 	/* Protected by rx_lock */
+	u32 buf_alloc;
 	u32 fwd_cnt;
 	u32 rx_bytes;
 	struct list_head rx_queue;
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 65c8b4a23f2b..f2e4e128bc86 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -246,10 +246,10 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs, u32 len)
 
 void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
 {
-	spin_lock_bh(&vvs->tx_lock);
+	spin_lock_bh(&vvs->rx_lock);
 	pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
 	pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
-	spin_unlock_bh(&vvs->tx_lock);
+	spin_unlock_bh(&vvs->rx_lock);
 }
 EXPORT_SYMBOL_GPL(virtio_transport_inc_tx_pkt);
 
@@ -469,7 +469,9 @@ void virtio_transport_set_buffer_size(struct vsock_sock *vsk, u64 val)
 	if (val > vvs->buf_size_max)
 		vvs->buf_size_max = val;
 	vvs->buf_size = val;
+	spin_lock_bh(&vvs->rx_lock);
 	vvs->buf_alloc = val;
+	spin_unlock_bh(&vvs->rx_lock);
 }
 EXPORT_SYMBOL_GPL(virtio_transport_set_buffer_size);
 
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 4/8] vsock/virtio: reduce credit update messages
  2019-05-10 12:58 [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
                   ` (5 preceding siblings ...)
  2019-05-10 12:58 ` Stefano Garzarella
@ 2019-05-10 12:58 ` Stefano Garzarella
  2019-05-10 12:58 ` Stefano Garzarella
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-10 12:58 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Stefan Hajnoczi, Jason Wang

In order to reduce the number of credit update messages,
we send them only when the space available seen by the
transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 include/linux/virtio_vsock.h            |  1 +
 net/vmw_vsock/virtio_transport_common.c | 16 +++++++++++++---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index fb5954fc85c8..84b72026d327 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -41,6 +41,7 @@ struct virtio_vsock_sock {
 	/* Protected by rx_lock */
 	u32 buf_alloc;
 	u32 fwd_cnt;
+	u32 last_fwd_cnt;
 	u32 rx_bytes;
 	struct list_head rx_queue;
 };
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index f2e4e128bc86..b61fd5e29a1f 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -247,6 +247,7 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs, u32 len)
 void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
 {
 	spin_lock_bh(&vvs->rx_lock);
+	vvs->last_fwd_cnt = vvs->fwd_cnt;
 	pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
 	pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
 	spin_unlock_bh(&vvs->rx_lock);
@@ -297,6 +298,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 	struct virtio_vsock_sock *vvs = vsk->trans;
 	struct virtio_vsock_buf *buf;
 	size_t bytes, total = 0;
+	u32 free_space;
 	int err = -EFAULT;
 
 	spin_lock_bh(&vvs->rx_lock);
@@ -327,11 +329,19 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 			virtio_transport_free_buf(buf);
 		}
 	}
+
+	free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
+
 	spin_unlock_bh(&vvs->rx_lock);
 
-	/* Send a credit pkt to peer */
-	virtio_transport_send_credit_update(vsk, VIRTIO_VSOCK_TYPE_STREAM,
-					    NULL);
+	/* We send a credit update only when the space available seen
+	 * by the transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE
+	 */
+	if (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
+		virtio_transport_send_credit_update(vsk,
+						    VIRTIO_VSOCK_TYPE_STREAM,
+						    NULL);
+	}
 
 	return total;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 4/8] vsock/virtio: reduce credit update messages
  2019-05-10 12:58 [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
                   ` (6 preceding siblings ...)
  2019-05-10 12:58 ` [PATCH v2 4/8] vsock/virtio: reduce credit update messages Stefano Garzarella
@ 2019-05-10 12:58 ` Stefano Garzarella
  2019-05-10 12:58 ` [PATCH v2 5/8] vhost/vsock: split packets to send using multiple buffers Stefano Garzarella
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-10 12:58 UTC (permalink / raw)
  To: netdev
  Cc: kvm, Michael S. Tsirkin, linux-kernel, virtualization,
	Stefan Hajnoczi, David S. Miller

In order to reduce the number of credit update messages,
we send them only when the space available seen by the
transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 include/linux/virtio_vsock.h            |  1 +
 net/vmw_vsock/virtio_transport_common.c | 16 +++++++++++++---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index fb5954fc85c8..84b72026d327 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -41,6 +41,7 @@ struct virtio_vsock_sock {
 	/* Protected by rx_lock */
 	u32 buf_alloc;
 	u32 fwd_cnt;
+	u32 last_fwd_cnt;
 	u32 rx_bytes;
 	struct list_head rx_queue;
 };
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index f2e4e128bc86..b61fd5e29a1f 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -247,6 +247,7 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs, u32 len)
 void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
 {
 	spin_lock_bh(&vvs->rx_lock);
+	vvs->last_fwd_cnt = vvs->fwd_cnt;
 	pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
 	pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
 	spin_unlock_bh(&vvs->rx_lock);
@@ -297,6 +298,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 	struct virtio_vsock_sock *vvs = vsk->trans;
 	struct virtio_vsock_buf *buf;
 	size_t bytes, total = 0;
+	u32 free_space;
 	int err = -EFAULT;
 
 	spin_lock_bh(&vvs->rx_lock);
@@ -327,11 +329,19 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 			virtio_transport_free_buf(buf);
 		}
 	}
+
+	free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
+
 	spin_unlock_bh(&vvs->rx_lock);
 
-	/* Send a credit pkt to peer */
-	virtio_transport_send_credit_update(vsk, VIRTIO_VSOCK_TYPE_STREAM,
-					    NULL);
+	/* We send a credit update only when the space available seen
+	 * by the transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE
+	 */
+	if (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
+		virtio_transport_send_credit_update(vsk,
+						    VIRTIO_VSOCK_TYPE_STREAM,
+						    NULL);
+	}
 
 	return total;
 
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 5/8] vhost/vsock: split packets to send using multiple buffers
  2019-05-10 12:58 [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
                   ` (8 preceding siblings ...)
  2019-05-10 12:58 ` [PATCH v2 5/8] vhost/vsock: split packets to send using multiple buffers Stefano Garzarella
@ 2019-05-10 12:58 ` Stefano Garzarella
  2019-05-10 12:58 ` [PATCH v2 6/8] vsock/virtio: change the maximum packet size allowed Stefano Garzarella
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-10 12:58 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Stefan Hajnoczi, Jason Wang

If the packets to sent to the guest are bigger than the buffer
available, we can split them, using multiple buffers and fixing
the length in the packet header.
This is safe since virtio-vsock supports only stream sockets.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 drivers/vhost/vsock.c                   | 51 +++++++++++++++++++------
 net/vmw_vsock/virtio_transport_common.c | 15 ++++++--
 2 files changed, 52 insertions(+), 14 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 7964e2daee09..fb731d09f5f1 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -94,7 +94,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 		struct iov_iter iov_iter;
 		unsigned out, in;
 		size_t nbytes;
-		size_t len;
+		size_t iov_len, payload_len;
 		int head;
 
 		spin_lock_bh(&vsock->send_pkt_list_lock);
@@ -139,8 +139,24 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 			break;
 		}
 
-		len = iov_length(&vq->iov[out], in);
-		iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
+		iov_len = iov_length(&vq->iov[out], in);
+		if (iov_len < sizeof(pkt->hdr)) {
+			virtio_transport_free_pkt(pkt);
+			vq_err(vq, "Buffer len [%zu] too small\n", iov_len);
+			break;
+		}
+
+		iov_iter_init(&iov_iter, READ, &vq->iov[out], in, iov_len);
+		payload_len = pkt->len - pkt->off;
+
+		/* If the packet is greater than the space available in the
+		 * buffer, we split it using multiple buffers.
+		 */
+		if (payload_len > iov_len - sizeof(pkt->hdr))
+			payload_len = iov_len - sizeof(pkt->hdr);
+
+		/* Set the correct length in the header */
+		pkt->hdr.len = cpu_to_le32(payload_len);
 
 		nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
 		if (nbytes != sizeof(pkt->hdr)) {
@@ -149,16 +165,34 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 			break;
 		}
 
-		nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);
-		if (nbytes != pkt->len) {
+		nbytes = copy_to_iter(pkt->buf + pkt->off, payload_len,
+				      &iov_iter);
+		if (nbytes != payload_len) {
 			virtio_transport_free_pkt(pkt);
 			vq_err(vq, "Faulted on copying pkt buf\n");
 			break;
 		}
 
-		vhost_add_used(vq, head, sizeof(pkt->hdr) + pkt->len);
+		vhost_add_used(vq, head, sizeof(pkt->hdr) + payload_len);
 		added = true;
 
+		/* Deliver to monitoring devices all correctly transmitted
+		 * packets.
+		 */
+		virtio_transport_deliver_tap_pkt(pkt);
+
+		pkt->off += payload_len;
+
+		/* If we didn't send all the payload we can requeue the packet
+		 * to send it with the next available buffer.
+		 */
+		if (pkt->off < pkt->len) {
+			spin_lock_bh(&vsock->send_pkt_list_lock);
+			list_add(&pkt->list, &vsock->send_pkt_list);
+			spin_unlock_bh(&vsock->send_pkt_list_lock);
+			continue;
+		}
+
 		if (pkt->reply) {
 			int val;
 
@@ -169,11 +203,6 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 				restart_tx = true;
 		}
 
-		/* Deliver to monitoring devices all correctly transmitted
-		 * packets.
-		 */
-		virtio_transport_deliver_tap_pkt(pkt);
-
 		virtio_transport_free_pkt(pkt);
 	}
 	if (added)
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index b61fd5e29a1f..3f313bcd6a26 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -135,8 +135,17 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque)
 	struct virtio_vsock_pkt *pkt = opaque;
 	struct af_vsockmon_hdr *hdr;
 	struct sk_buff *skb;
+	size_t payload_len;
+	void *payload_buf;
 
-	skb = alloc_skb(sizeof(*hdr) + sizeof(pkt->hdr) + pkt->len,
+	/* A packet could be split to fit the RX buffer, so we can retrieve
+	 * the payload length from the header and the buffer pointer taking
+	 * care of the offset in the original packet.
+	 */
+	payload_len = le32_to_cpu(pkt->hdr.len);
+	payload_buf = pkt->buf + pkt->off;
+
+	skb = alloc_skb(sizeof(*hdr) + sizeof(pkt->hdr) + payload_len,
 			GFP_ATOMIC);
 	if (!skb)
 		return NULL;
@@ -176,8 +185,8 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque)
 
 	skb_put_data(skb, &pkt->hdr, sizeof(pkt->hdr));
 
-	if (pkt->len) {
-		skb_put_data(skb, pkt->buf, pkt->len);
+	if (payload_len) {
+		skb_put_data(skb, payload_buf, payload_len);
 	}
 
 	return skb;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 5/8] vhost/vsock: split packets to send using multiple buffers
  2019-05-10 12:58 [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
                   ` (7 preceding siblings ...)
  2019-05-10 12:58 ` Stefano Garzarella
@ 2019-05-10 12:58 ` Stefano Garzarella
  2019-05-10 12:58 ` Stefano Garzarella
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-10 12:58 UTC (permalink / raw)
  To: netdev
  Cc: kvm, Michael S. Tsirkin, linux-kernel, virtualization,
	Stefan Hajnoczi, David S. Miller

If the packets to sent to the guest are bigger than the buffer
available, we can split them, using multiple buffers and fixing
the length in the packet header.
This is safe since virtio-vsock supports only stream sockets.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 drivers/vhost/vsock.c                   | 51 +++++++++++++++++++------
 net/vmw_vsock/virtio_transport_common.c | 15 ++++++--
 2 files changed, 52 insertions(+), 14 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 7964e2daee09..fb731d09f5f1 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -94,7 +94,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 		struct iov_iter iov_iter;
 		unsigned out, in;
 		size_t nbytes;
-		size_t len;
+		size_t iov_len, payload_len;
 		int head;
 
 		spin_lock_bh(&vsock->send_pkt_list_lock);
@@ -139,8 +139,24 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 			break;
 		}
 
-		len = iov_length(&vq->iov[out], in);
-		iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
+		iov_len = iov_length(&vq->iov[out], in);
+		if (iov_len < sizeof(pkt->hdr)) {
+			virtio_transport_free_pkt(pkt);
+			vq_err(vq, "Buffer len [%zu] too small\n", iov_len);
+			break;
+		}
+
+		iov_iter_init(&iov_iter, READ, &vq->iov[out], in, iov_len);
+		payload_len = pkt->len - pkt->off;
+
+		/* If the packet is greater than the space available in the
+		 * buffer, we split it using multiple buffers.
+		 */
+		if (payload_len > iov_len - sizeof(pkt->hdr))
+			payload_len = iov_len - sizeof(pkt->hdr);
+
+		/* Set the correct length in the header */
+		pkt->hdr.len = cpu_to_le32(payload_len);
 
 		nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
 		if (nbytes != sizeof(pkt->hdr)) {
@@ -149,16 +165,34 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 			break;
 		}
 
-		nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);
-		if (nbytes != pkt->len) {
+		nbytes = copy_to_iter(pkt->buf + pkt->off, payload_len,
+				      &iov_iter);
+		if (nbytes != payload_len) {
 			virtio_transport_free_pkt(pkt);
 			vq_err(vq, "Faulted on copying pkt buf\n");
 			break;
 		}
 
-		vhost_add_used(vq, head, sizeof(pkt->hdr) + pkt->len);
+		vhost_add_used(vq, head, sizeof(pkt->hdr) + payload_len);
 		added = true;
 
+		/* Deliver to monitoring devices all correctly transmitted
+		 * packets.
+		 */
+		virtio_transport_deliver_tap_pkt(pkt);
+
+		pkt->off += payload_len;
+
+		/* If we didn't send all the payload we can requeue the packet
+		 * to send it with the next available buffer.
+		 */
+		if (pkt->off < pkt->len) {
+			spin_lock_bh(&vsock->send_pkt_list_lock);
+			list_add(&pkt->list, &vsock->send_pkt_list);
+			spin_unlock_bh(&vsock->send_pkt_list_lock);
+			continue;
+		}
+
 		if (pkt->reply) {
 			int val;
 
@@ -169,11 +203,6 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 				restart_tx = true;
 		}
 
-		/* Deliver to monitoring devices all correctly transmitted
-		 * packets.
-		 */
-		virtio_transport_deliver_tap_pkt(pkt);
-
 		virtio_transport_free_pkt(pkt);
 	}
 	if (added)
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index b61fd5e29a1f..3f313bcd6a26 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -135,8 +135,17 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque)
 	struct virtio_vsock_pkt *pkt = opaque;
 	struct af_vsockmon_hdr *hdr;
 	struct sk_buff *skb;
+	size_t payload_len;
+	void *payload_buf;
 
-	skb = alloc_skb(sizeof(*hdr) + sizeof(pkt->hdr) + pkt->len,
+	/* A packet could be split to fit the RX buffer, so we can retrieve
+	 * the payload length from the header and the buffer pointer taking
+	 * care of the offset in the original packet.
+	 */
+	payload_len = le32_to_cpu(pkt->hdr.len);
+	payload_buf = pkt->buf + pkt->off;
+
+	skb = alloc_skb(sizeof(*hdr) + sizeof(pkt->hdr) + payload_len,
 			GFP_ATOMIC);
 	if (!skb)
 		return NULL;
@@ -176,8 +185,8 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque)
 
 	skb_put_data(skb, &pkt->hdr, sizeof(pkt->hdr));
 
-	if (pkt->len) {
-		skb_put_data(skb, pkt->buf, pkt->len);
+	if (payload_len) {
+		skb_put_data(skb, payload_buf, payload_len);
 	}
 
 	return skb;
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 6/8] vsock/virtio: change the maximum packet size allowed
  2019-05-10 12:58 [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
                   ` (9 preceding siblings ...)
  2019-05-10 12:58 ` Stefano Garzarella
@ 2019-05-10 12:58 ` Stefano Garzarella
  2019-05-10 12:58 ` Stefano Garzarella
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-10 12:58 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Stefan Hajnoczi, Jason Wang

Since now we are able to split packets, we can avoid limiting
their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
packet size.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 net/vmw_vsock/virtio_transport_common.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 3f313bcd6a26..63606525755d 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -219,8 +219,8 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 	vvs = vsk->trans;
 
 	/* we can send less than pkt_len bytes */
-	if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
-		pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
+	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
+		pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
 
 	/* virtio_transport_get_credit might return less than pkt_len credit */
 	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 6/8] vsock/virtio: change the maximum packet size allowed
  2019-05-10 12:58 [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
                   ` (10 preceding siblings ...)
  2019-05-10 12:58 ` [PATCH v2 6/8] vsock/virtio: change the maximum packet size allowed Stefano Garzarella
@ 2019-05-10 12:58 ` Stefano Garzarella
  2019-05-10 12:58 ` [PATCH v2 7/8] vsock/virtio: increase RX buffer size to 64 KiB Stefano Garzarella
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-10 12:58 UTC (permalink / raw)
  To: netdev
  Cc: kvm, Michael S. Tsirkin, linux-kernel, virtualization,
	Stefan Hajnoczi, David S. Miller

Since now we are able to split packets, we can avoid limiting
their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
packet size.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 net/vmw_vsock/virtio_transport_common.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 3f313bcd6a26..63606525755d 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -219,8 +219,8 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 	vvs = vsk->trans;
 
 	/* we can send less than pkt_len bytes */
-	if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
-		pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
+	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
+		pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
 
 	/* virtio_transport_get_credit might return less than pkt_len credit */
 	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 7/8] vsock/virtio: increase RX buffer size to 64 KiB
  2019-05-10 12:58 [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
                   ` (11 preceding siblings ...)
  2019-05-10 12:58 ` Stefano Garzarella
@ 2019-05-10 12:58 ` Stefano Garzarella
  2019-05-13 10:01   ` Jason Wang
  2019-05-13 10:01   ` Jason Wang
  2019-05-10 12:58 ` Stefano Garzarella
                   ` (4 subsequent siblings)
  17 siblings, 2 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-10 12:58 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Stefan Hajnoczi, Jason Wang

In order to increase host -> guest throughput with large packets,
we can use 64 KiB RX buffers.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 include/linux/virtio_vsock.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 84b72026d327..5a9d25be72df 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -10,7 +10,7 @@
 #define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE	128
 #define VIRTIO_VSOCK_DEFAULT_BUF_SIZE		(1024 * 256)
 #define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE	(1024 * 256)
-#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 4)
+#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 64)
 #define VIRTIO_VSOCK_MAX_BUF_SIZE		0xFFFFFFFFUL
 #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE		(1024 * 64)
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 7/8] vsock/virtio: increase RX buffer size to 64 KiB
  2019-05-10 12:58 [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
                   ` (12 preceding siblings ...)
  2019-05-10 12:58 ` [PATCH v2 7/8] vsock/virtio: increase RX buffer size to 64 KiB Stefano Garzarella
@ 2019-05-10 12:58 ` Stefano Garzarella
  2019-05-10 12:58 ` [PATCH v2 8/8] vsock/virtio: make the RX buffer size tunable Stefano Garzarella
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-10 12:58 UTC (permalink / raw)
  To: netdev
  Cc: kvm, Michael S. Tsirkin, linux-kernel, virtualization,
	Stefan Hajnoczi, David S. Miller

In order to increase host -> guest throughput with large packets,
we can use 64 KiB RX buffers.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 include/linux/virtio_vsock.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 84b72026d327..5a9d25be72df 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -10,7 +10,7 @@
 #define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE	128
 #define VIRTIO_VSOCK_DEFAULT_BUF_SIZE		(1024 * 256)
 #define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE	(1024 * 256)
-#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 4)
+#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 64)
 #define VIRTIO_VSOCK_MAX_BUF_SIZE		0xFFFFFFFFUL
 #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE		(1024 * 64)
 
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 8/8] vsock/virtio: make the RX buffer size tunable
  2019-05-10 12:58 [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
                   ` (14 preceding siblings ...)
  2019-05-10 12:58 ` [PATCH v2 8/8] vsock/virtio: make the RX buffer size tunable Stefano Garzarella
@ 2019-05-10 12:58 ` Stefano Garzarella
  2019-05-13 10:05     ` Jason Wang
  2019-05-13  9:33 ` [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput Jason Wang
  2019-05-13  9:33 ` Jason Wang
  17 siblings, 1 reply; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-10 12:58 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Stefan Hajnoczi, Jason Wang

The RX buffer size determines the memory consumption of the
vsock/virtio guest driver, so we make it tunable through
a module parameter.

The size allowed are between 4 KB and 64 KB in order to be
compatible with old host drivers.

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 include/linux/virtio_vsock.h     |  1 +
 net/vmw_vsock/virtio_transport.c | 27 ++++++++++++++++++++++++++-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 5a9d25be72df..b9f8c3d91f80 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -13,6 +13,7 @@
 #define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 64)
 #define VIRTIO_VSOCK_MAX_BUF_SIZE		0xFFFFFFFFUL
 #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE		(1024 * 64)
+#define VIRTIO_VSOCK_MIN_PKT_BUF_SIZE		(1024 * 4)
 
 enum {
 	VSOCK_VQ_RX     = 0, /* for host to guest data */
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index af1d2ce12f54..732398b4e28f 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -66,6 +66,31 @@ struct virtio_vsock {
 	u32 guest_cid;
 };
 
+static unsigned int rx_buf_size = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
+
+static int param_set_rx_buf_size(const char *val, const struct kernel_param *kp)
+{
+	unsigned int size;
+	int ret;
+
+	ret = kstrtouint(val, 0, &size);
+	if (ret)
+		return ret;
+
+	if (size < VIRTIO_VSOCK_MIN_PKT_BUF_SIZE ||
+	    size > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
+		return -EINVAL;
+
+	return param_set_uint(val, kp);
+};
+
+static const struct kernel_param_ops param_ops_rx_buf_size = {
+	.set = param_set_rx_buf_size,
+	.get = param_get_uint,
+};
+
+module_param_cb(rx_buf_size, &param_ops_rx_buf_size, &rx_buf_size, 0644);
+
 static struct virtio_vsock *virtio_vsock_get(void)
 {
 	return the_virtio_vsock;
@@ -261,7 +286,7 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
 
 static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
 {
-	int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
+	int buf_len = rx_buf_size;
 	struct virtio_vsock_pkt *pkt;
 	struct scatterlist hdr, buf, *sgs[2];
 	struct virtqueue *vq;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 8/8] vsock/virtio: make the RX buffer size tunable
  2019-05-10 12:58 [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
                   ` (13 preceding siblings ...)
  2019-05-10 12:58 ` Stefano Garzarella
@ 2019-05-10 12:58 ` Stefano Garzarella
  2019-05-10 12:58 ` Stefano Garzarella
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-10 12:58 UTC (permalink / raw)
  To: netdev
  Cc: kvm, Michael S. Tsirkin, linux-kernel, virtualization,
	Stefan Hajnoczi, David S. Miller

The RX buffer size determines the memory consumption of the
vsock/virtio guest driver, so we make it tunable through
a module parameter.

The size allowed are between 4 KB and 64 KB in order to be
compatible with old host drivers.

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 include/linux/virtio_vsock.h     |  1 +
 net/vmw_vsock/virtio_transport.c | 27 ++++++++++++++++++++++++++-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 5a9d25be72df..b9f8c3d91f80 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -13,6 +13,7 @@
 #define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 64)
 #define VIRTIO_VSOCK_MAX_BUF_SIZE		0xFFFFFFFFUL
 #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE		(1024 * 64)
+#define VIRTIO_VSOCK_MIN_PKT_BUF_SIZE		(1024 * 4)
 
 enum {
 	VSOCK_VQ_RX     = 0, /* for host to guest data */
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index af1d2ce12f54..732398b4e28f 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -66,6 +66,31 @@ struct virtio_vsock {
 	u32 guest_cid;
 };
 
+static unsigned int rx_buf_size = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
+
+static int param_set_rx_buf_size(const char *val, const struct kernel_param *kp)
+{
+	unsigned int size;
+	int ret;
+
+	ret = kstrtouint(val, 0, &size);
+	if (ret)
+		return ret;
+
+	if (size < VIRTIO_VSOCK_MIN_PKT_BUF_SIZE ||
+	    size > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
+		return -EINVAL;
+
+	return param_set_uint(val, kp);
+};
+
+static const struct kernel_param_ops param_ops_rx_buf_size = {
+	.set = param_set_rx_buf_size,
+	.get = param_get_uint,
+};
+
+module_param_cb(rx_buf_size, &param_ops_rx_buf_size, &rx_buf_size, 0644);
+
 static struct virtio_vsock *virtio_vsock_get(void)
 {
 	return the_virtio_vsock;
@@ -261,7 +286,7 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
 
 static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
 {
-	int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
+	int buf_len = rx_buf_size;
 	struct virtio_vsock_pkt *pkt;
 	struct scatterlist hdr, buf, *sgs[2];
 	struct virtqueue *vq;
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 2/8] vsock/virtio: free packets during the socket release
  2019-05-10 12:58 ` [PATCH v2 2/8] vsock/virtio: free packets during the socket release Stefano Garzarella
  2019-05-10 22:20   ` David Miller
@ 2019-05-10 22:20   ` David Miller
  2019-05-11  8:27       ` Stefano Garzarella
  2019-05-16 15:32   ` Stefan Hajnoczi
  2019-05-16 15:32   ` Stefan Hajnoczi
  3 siblings, 1 reply; 75+ messages in thread
From: David Miller @ 2019-05-10 22:20 UTC (permalink / raw)
  To: sgarzare
  Cc: netdev, mst, virtualization, linux-kernel, kvm, stefanha, jasowang

From: Stefano Garzarella <sgarzare@redhat.com>
Date: Fri, 10 May 2019 14:58:37 +0200

> @@ -827,12 +827,20 @@ static bool virtio_transport_close(struct vsock_sock *vsk)
>  
>  void virtio_transport_release(struct vsock_sock *vsk)
>  {
> +	struct virtio_vsock_sock *vvs = vsk->trans;
> +	struct virtio_vsock_buf *buf;
>  	struct sock *sk = &vsk->sk;
>  	bool remove_sock = true;
>  
>  	lock_sock(sk);
>  	if (sk->sk_type == SOCK_STREAM)
>  		remove_sock = virtio_transport_close(vsk);
> +	while (!list_empty(&vvs->rx_queue)) {
> +		buf = list_first_entry(&vvs->rx_queue,
> +				       struct virtio_vsock_buf, list);

Please use list_for_each_entry_safe().

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 2/8] vsock/virtio: free packets during the socket release
  2019-05-10 12:58 ` [PATCH v2 2/8] vsock/virtio: free packets during the socket release Stefano Garzarella
@ 2019-05-10 22:20   ` David Miller
  2019-05-10 22:20   ` David Miller
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 75+ messages in thread
From: David Miller @ 2019-05-10 22:20 UTC (permalink / raw)
  To: sgarzare; +Cc: kvm, mst, netdev, linux-kernel, virtualization, stefanha

From: Stefano Garzarella <sgarzare@redhat.com>
Date: Fri, 10 May 2019 14:58:37 +0200

> @@ -827,12 +827,20 @@ static bool virtio_transport_close(struct vsock_sock *vsk)
>  
>  void virtio_transport_release(struct vsock_sock *vsk)
>  {
> +	struct virtio_vsock_sock *vvs = vsk->trans;
> +	struct virtio_vsock_buf *buf;
>  	struct sock *sk = &vsk->sk;
>  	bool remove_sock = true;
>  
>  	lock_sock(sk);
>  	if (sk->sk_type == SOCK_STREAM)
>  		remove_sock = virtio_transport_close(vsk);
> +	while (!list_empty(&vvs->rx_queue)) {
> +		buf = list_first_entry(&vvs->rx_queue,
> +				       struct virtio_vsock_buf, list);

Please use list_for_each_entry_safe().

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 2/8] vsock/virtio: free packets during the socket release
  2019-05-10 22:20   ` David Miller
@ 2019-05-11  8:27       ` Stefano Garzarella
  0 siblings, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-11  8:27 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, mst, virtualization, linux-kernel, kvm, stefanha, jasowang

On Fri, May 10, 2019 at 03:20:08PM -0700, David Miller wrote:
> From: Stefano Garzarella <sgarzare@redhat.com>
> Date: Fri, 10 May 2019 14:58:37 +0200
> 
> > @@ -827,12 +827,20 @@ static bool virtio_transport_close(struct vsock_sock *vsk)
> >  
> >  void virtio_transport_release(struct vsock_sock *vsk)
> >  {
> > +	struct virtio_vsock_sock *vvs = vsk->trans;
> > +	struct virtio_vsock_buf *buf;
> >  	struct sock *sk = &vsk->sk;
> >  	bool remove_sock = true;
> >  
> >  	lock_sock(sk);
> >  	if (sk->sk_type == SOCK_STREAM)
> >  		remove_sock = virtio_transport_close(vsk);
> > +	while (!list_empty(&vvs->rx_queue)) {
> > +		buf = list_first_entry(&vvs->rx_queue,
> > +				       struct virtio_vsock_buf, list);
> 
> Please use list_for_each_entry_safe().

Thanks for the review, I'll change it in the v3.

Cheers,
Stefano

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 2/8] vsock/virtio: free packets during the socket release
@ 2019-05-11  8:27       ` Stefano Garzarella
  0 siblings, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-11  8:27 UTC (permalink / raw)
  To: David Miller; +Cc: kvm, mst, netdev, linux-kernel, virtualization, stefanha

On Fri, May 10, 2019 at 03:20:08PM -0700, David Miller wrote:
> From: Stefano Garzarella <sgarzare@redhat.com>
> Date: Fri, 10 May 2019 14:58:37 +0200
> 
> > @@ -827,12 +827,20 @@ static bool virtio_transport_close(struct vsock_sock *vsk)
> >  
> >  void virtio_transport_release(struct vsock_sock *vsk)
> >  {
> > +	struct virtio_vsock_sock *vvs = vsk->trans;
> > +	struct virtio_vsock_buf *buf;
> >  	struct sock *sk = &vsk->sk;
> >  	bool remove_sock = true;
> >  
> >  	lock_sock(sk);
> >  	if (sk->sk_type == SOCK_STREAM)
> >  		remove_sock = virtio_transport_close(vsk);
> > +	while (!list_empty(&vvs->rx_queue)) {
> > +		buf = list_first_entry(&vvs->rx_queue,
> > +				       struct virtio_vsock_buf, list);
> 
> Please use list_for_each_entry_safe().

Thanks for the review, I'll change it in the v3.

Cheers,
Stefano

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket
  2019-05-10 12:58 ` Stefano Garzarella
@ 2019-05-12 16:57     ` Michael S. Tsirkin
  2019-05-13  9:58   ` Jason Wang
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 75+ messages in thread
From: Michael S. Tsirkin @ 2019-05-12 16:57 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, David S. Miller, virtualization, linux-kernel, kvm,
	Stefan Hajnoczi, Jason Wang

On Fri, May 10, 2019 at 02:58:36PM +0200, Stefano Garzarella wrote:
> Since virtio-vsock was introduced, the buffers filled by the host
> and pushed to the guest using the vring, are directly queued in
> a per-socket list avoiding to copy it.
> These buffers are preallocated by the guest with a fixed
> size (4 KB).
> 
> The maximum amount of memory used by each socket should be
> controlled by the credit mechanism.
> The default credit available per-socket is 256 KB, but if we use
> only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> buffers, using up to 1 GB of memory per-socket. In addition, the
> guest will continue to fill the vring with new 4 KB free buffers
> to avoid starvation of other sockets.
> 
> This patch solves this issue copying the payload in a new buffer.
> Then it is queued in the per-socket list, and the 4KB buffer used
> by the host is freed.
> 
> In this way, the memory used by each socket respects the credit
> available, and we still avoid starvation, paying the cost of an
> extra memory copy. When the buffer is completely full we do a
> "zero-copy", moving the buffer directly in the per-socket list.
> 
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  drivers/vhost/vsock.c                   |  2 +
>  include/linux/virtio_vsock.h            |  8 +++
>  net/vmw_vsock/virtio_transport.c        |  1 +
>  net/vmw_vsock/virtio_transport_common.c | 95 ++++++++++++++++++-------
>  4 files changed, 81 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index bb5fc0e9fbc2..7964e2daee09 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -320,6 +320,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
>  		return NULL;
>  	}
>  
> +	pkt->buf_len = pkt->len;
> +
>  	nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter);
>  	if (nbytes != pkt->len) {
>  		vq_err(vq, "Expected %u byte payload, got %zu bytes\n",
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index e223e2632edd..345f04ee9193 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -54,9 +54,17 @@ struct virtio_vsock_pkt {
>  	void *buf;
>  	u32 len;
>  	u32 off;
> +	u32 buf_len;
>  	bool reply;
>  };
>  
> +struct virtio_vsock_buf {
> +	struct list_head list;
> +	void *addr;
> +	u32 len;
> +	u32 off;
> +};
> +
>  struct virtio_vsock_pkt_info {
>  	u32 remote_cid, remote_port;
>  	struct vsock_sock *vsk;
> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> index 15eb5d3d4750..af1d2ce12f54 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -280,6 +280,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
>  			break;
>  		}
>  
> +		pkt->buf_len = buf_len;
>  		pkt->len = buf_len;
>  
>  		sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 602715fc9a75..0248d6808755 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -65,6 +65,9 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
>  		pkt->buf = kmalloc(len, GFP_KERNEL);
>  		if (!pkt->buf)
>  			goto out_pkt;
> +
> +		pkt->buf_len = len;
> +
>  		err = memcpy_from_msg(pkt->buf, info->msg, len);
>  		if (err)
>  			goto out;
> @@ -86,6 +89,46 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
>  	return NULL;
>  }
>  
> +static struct virtio_vsock_buf *
> +virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
> +{
> +	struct virtio_vsock_buf *buf;
> +
> +	if (pkt->len == 0)
> +		return NULL;
> +
> +	buf = kzalloc(sizeof(*buf), GFP_KERNEL);
> +	if (!buf)
> +		return NULL;
> +
> +	/* If the buffer in the virtio_vsock_pkt is full, we can move it to
> +	 * the new virtio_vsock_buf avoiding the copy, because we are sure that
> +	 * we are not use

we do not use

> more memory than that counted by the credit mechanism.
> +	 */
> +	if (zero_copy && pkt->len == pkt->buf_len) {
> +		buf->addr = pkt->buf;
> +		pkt->buf = NULL;
> +	} else {
> +		buf->addr = kmalloc(pkt->len, GFP_KERNEL);
> +		if (!buf->addr) {
> +			kfree(buf);
> +			return NULL;
> +		}
> +
> +		memcpy(buf->addr, pkt->buf, pkt->len);
> +	}
> +
> +	buf->len = pkt->len;
> +
> +	return buf;
> +}
> +
> +static void virtio_transport_free_buf(struct virtio_vsock_buf *buf)
> +{
> +	kfree(buf->addr);
> +	kfree(buf);
> +}
> +
>  /* Packet capture */
>  static struct sk_buff *virtio_transport_build_skb(void *opaque)
>  {
> @@ -190,17 +233,15 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>  	return virtio_transport_get_ops()->send_pkt(pkt);
>  }
>  
> -static void virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
> -					struct virtio_vsock_pkt *pkt)
> +static void virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs, u32 len)
>  {
> -	vvs->rx_bytes += pkt->len;
> +	vvs->rx_bytes += len;
>  }
>  
> -static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
> -					struct virtio_vsock_pkt *pkt)
> +static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs, u32 len)
>  {
> -	vvs->rx_bytes -= pkt->len;
> -	vvs->fwd_cnt += pkt->len;
> +	vvs->rx_bytes -= len;
> +	vvs->fwd_cnt += len;
>  }
>  
>  void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
> @@ -254,36 +295,36 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
>  				   size_t len)
>  {
>  	struct virtio_vsock_sock *vvs = vsk->trans;
> -	struct virtio_vsock_pkt *pkt;
> +	struct virtio_vsock_buf *buf;
>  	size_t bytes, total = 0;
>  	int err = -EFAULT;
>  
>  	spin_lock_bh(&vvs->rx_lock);
>  	while (total < len && !list_empty(&vvs->rx_queue)) {
> -		pkt = list_first_entry(&vvs->rx_queue,
> -				       struct virtio_vsock_pkt, list);
> +		buf = list_first_entry(&vvs->rx_queue,
> +				       struct virtio_vsock_buf, list);
>  
>  		bytes = len - total;
> -		if (bytes > pkt->len - pkt->off)
> -			bytes = pkt->len - pkt->off;
> +		if (bytes > buf->len - buf->off)
> +			bytes = buf->len - buf->off;
>  
>  		/* sk_lock is held by caller so no one else can dequeue.
>  		 * Unlock rx_lock since memcpy_to_msg() may sleep.
>  		 */
>  		spin_unlock_bh(&vvs->rx_lock);
>  
> -		err = memcpy_to_msg(msg, pkt->buf + pkt->off, bytes);
> +		err = memcpy_to_msg(msg, buf->addr + buf->off, bytes);
>  		if (err)
>  			goto out;
>  
>  		spin_lock_bh(&vvs->rx_lock);
>  
>  		total += bytes;
> -		pkt->off += bytes;
> -		if (pkt->off == pkt->len) {
> -			virtio_transport_dec_rx_pkt(vvs, pkt);
> -			list_del(&pkt->list);
> -			virtio_transport_free_pkt(pkt);
> +		buf->off += bytes;
> +		if (buf->off == buf->len) {
> +			virtio_transport_dec_rx_pkt(vvs, buf->len);
> +			list_del(&buf->list);
> +			virtio_transport_free_buf(buf);
>  		}
>  	}
>  	spin_unlock_bh(&vvs->rx_lock);
> @@ -841,20 +882,24 @@ virtio_transport_recv_connected(struct sock *sk,
>  {
>  	struct vsock_sock *vsk = vsock_sk(sk);
>  	struct virtio_vsock_sock *vvs = vsk->trans;
> +	struct virtio_vsock_buf *buf;
>  	int err = 0;
>  
>  	switch (le16_to_cpu(pkt->hdr.op)) {
>  	case VIRTIO_VSOCK_OP_RW:
>  		pkt->len = le32_to_cpu(pkt->hdr.len);
> -		pkt->off = 0;
> +		buf = virtio_transport_alloc_buf(pkt, true);


This seems to be the only callers and second parameter
is always true. So why is it needed?

>  
> -		spin_lock_bh(&vvs->rx_lock);
> -		virtio_transport_inc_rx_pkt(vvs, pkt);
> -		list_add_tail(&pkt->list, &vvs->rx_queue);
> -		spin_unlock_bh(&vvs->rx_lock);
> +		if (buf) {
> +			spin_lock_bh(&vvs->rx_lock);
> +			virtio_transport_inc_rx_pkt(vvs, pkt->len);
> +			list_add_tail(&buf->list, &vvs->rx_queue);
> +			spin_unlock_bh(&vvs->rx_lock);
>  
> -		sk->sk_data_ready(sk);
> -		return err;
> +			sk->sk_data_ready(sk);
> +		}
> +
> +		break;
>  	case VIRTIO_VSOCK_OP_CREDIT_UPDATE:
>  		sk->sk_write_space(sk);
>  		break;
> -- 
> 2.20.1

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket
@ 2019-05-12 16:57     ` Michael S. Tsirkin
  0 siblings, 0 replies; 75+ messages in thread
From: Michael S. Tsirkin @ 2019-05-12 16:57 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: kvm, netdev, linux-kernel, virtualization, Stefan Hajnoczi,
	David S. Miller

On Fri, May 10, 2019 at 02:58:36PM +0200, Stefano Garzarella wrote:
> Since virtio-vsock was introduced, the buffers filled by the host
> and pushed to the guest using the vring, are directly queued in
> a per-socket list avoiding to copy it.
> These buffers are preallocated by the guest with a fixed
> size (4 KB).
> 
> The maximum amount of memory used by each socket should be
> controlled by the credit mechanism.
> The default credit available per-socket is 256 KB, but if we use
> only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> buffers, using up to 1 GB of memory per-socket. In addition, the
> guest will continue to fill the vring with new 4 KB free buffers
> to avoid starvation of other sockets.
> 
> This patch solves this issue copying the payload in a new buffer.
> Then it is queued in the per-socket list, and the 4KB buffer used
> by the host is freed.
> 
> In this way, the memory used by each socket respects the credit
> available, and we still avoid starvation, paying the cost of an
> extra memory copy. When the buffer is completely full we do a
> "zero-copy", moving the buffer directly in the per-socket list.
> 
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  drivers/vhost/vsock.c                   |  2 +
>  include/linux/virtio_vsock.h            |  8 +++
>  net/vmw_vsock/virtio_transport.c        |  1 +
>  net/vmw_vsock/virtio_transport_common.c | 95 ++++++++++++++++++-------
>  4 files changed, 81 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index bb5fc0e9fbc2..7964e2daee09 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -320,6 +320,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
>  		return NULL;
>  	}
>  
> +	pkt->buf_len = pkt->len;
> +
>  	nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter);
>  	if (nbytes != pkt->len) {
>  		vq_err(vq, "Expected %u byte payload, got %zu bytes\n",
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index e223e2632edd..345f04ee9193 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -54,9 +54,17 @@ struct virtio_vsock_pkt {
>  	void *buf;
>  	u32 len;
>  	u32 off;
> +	u32 buf_len;
>  	bool reply;
>  };
>  
> +struct virtio_vsock_buf {
> +	struct list_head list;
> +	void *addr;
> +	u32 len;
> +	u32 off;
> +};
> +
>  struct virtio_vsock_pkt_info {
>  	u32 remote_cid, remote_port;
>  	struct vsock_sock *vsk;
> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> index 15eb5d3d4750..af1d2ce12f54 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -280,6 +280,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
>  			break;
>  		}
>  
> +		pkt->buf_len = buf_len;
>  		pkt->len = buf_len;
>  
>  		sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 602715fc9a75..0248d6808755 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -65,6 +65,9 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
>  		pkt->buf = kmalloc(len, GFP_KERNEL);
>  		if (!pkt->buf)
>  			goto out_pkt;
> +
> +		pkt->buf_len = len;
> +
>  		err = memcpy_from_msg(pkt->buf, info->msg, len);
>  		if (err)
>  			goto out;
> @@ -86,6 +89,46 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
>  	return NULL;
>  }
>  
> +static struct virtio_vsock_buf *
> +virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
> +{
> +	struct virtio_vsock_buf *buf;
> +
> +	if (pkt->len == 0)
> +		return NULL;
> +
> +	buf = kzalloc(sizeof(*buf), GFP_KERNEL);
> +	if (!buf)
> +		return NULL;
> +
> +	/* If the buffer in the virtio_vsock_pkt is full, we can move it to
> +	 * the new virtio_vsock_buf avoiding the copy, because we are sure that
> +	 * we are not use

we do not use

> more memory than that counted by the credit mechanism.
> +	 */
> +	if (zero_copy && pkt->len == pkt->buf_len) {
> +		buf->addr = pkt->buf;
> +		pkt->buf = NULL;
> +	} else {
> +		buf->addr = kmalloc(pkt->len, GFP_KERNEL);
> +		if (!buf->addr) {
> +			kfree(buf);
> +			return NULL;
> +		}
> +
> +		memcpy(buf->addr, pkt->buf, pkt->len);
> +	}
> +
> +	buf->len = pkt->len;
> +
> +	return buf;
> +}
> +
> +static void virtio_transport_free_buf(struct virtio_vsock_buf *buf)
> +{
> +	kfree(buf->addr);
> +	kfree(buf);
> +}
> +
>  /* Packet capture */
>  static struct sk_buff *virtio_transport_build_skb(void *opaque)
>  {
> @@ -190,17 +233,15 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>  	return virtio_transport_get_ops()->send_pkt(pkt);
>  }
>  
> -static void virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
> -					struct virtio_vsock_pkt *pkt)
> +static void virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs, u32 len)
>  {
> -	vvs->rx_bytes += pkt->len;
> +	vvs->rx_bytes += len;
>  }
>  
> -static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
> -					struct virtio_vsock_pkt *pkt)
> +static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs, u32 len)
>  {
> -	vvs->rx_bytes -= pkt->len;
> -	vvs->fwd_cnt += pkt->len;
> +	vvs->rx_bytes -= len;
> +	vvs->fwd_cnt += len;
>  }
>  
>  void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
> @@ -254,36 +295,36 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
>  				   size_t len)
>  {
>  	struct virtio_vsock_sock *vvs = vsk->trans;
> -	struct virtio_vsock_pkt *pkt;
> +	struct virtio_vsock_buf *buf;
>  	size_t bytes, total = 0;
>  	int err = -EFAULT;
>  
>  	spin_lock_bh(&vvs->rx_lock);
>  	while (total < len && !list_empty(&vvs->rx_queue)) {
> -		pkt = list_first_entry(&vvs->rx_queue,
> -				       struct virtio_vsock_pkt, list);
> +		buf = list_first_entry(&vvs->rx_queue,
> +				       struct virtio_vsock_buf, list);
>  
>  		bytes = len - total;
> -		if (bytes > pkt->len - pkt->off)
> -			bytes = pkt->len - pkt->off;
> +		if (bytes > buf->len - buf->off)
> +			bytes = buf->len - buf->off;
>  
>  		/* sk_lock is held by caller so no one else can dequeue.
>  		 * Unlock rx_lock since memcpy_to_msg() may sleep.
>  		 */
>  		spin_unlock_bh(&vvs->rx_lock);
>  
> -		err = memcpy_to_msg(msg, pkt->buf + pkt->off, bytes);
> +		err = memcpy_to_msg(msg, buf->addr + buf->off, bytes);
>  		if (err)
>  			goto out;
>  
>  		spin_lock_bh(&vvs->rx_lock);
>  
>  		total += bytes;
> -		pkt->off += bytes;
> -		if (pkt->off == pkt->len) {
> -			virtio_transport_dec_rx_pkt(vvs, pkt);
> -			list_del(&pkt->list);
> -			virtio_transport_free_pkt(pkt);
> +		buf->off += bytes;
> +		if (buf->off == buf->len) {
> +			virtio_transport_dec_rx_pkt(vvs, buf->len);
> +			list_del(&buf->list);
> +			virtio_transport_free_buf(buf);
>  		}
>  	}
>  	spin_unlock_bh(&vvs->rx_lock);
> @@ -841,20 +882,24 @@ virtio_transport_recv_connected(struct sock *sk,
>  {
>  	struct vsock_sock *vsk = vsock_sk(sk);
>  	struct virtio_vsock_sock *vvs = vsk->trans;
> +	struct virtio_vsock_buf *buf;
>  	int err = 0;
>  
>  	switch (le16_to_cpu(pkt->hdr.op)) {
>  	case VIRTIO_VSOCK_OP_RW:
>  		pkt->len = le32_to_cpu(pkt->hdr.len);
> -		pkt->off = 0;
> +		buf = virtio_transport_alloc_buf(pkt, true);


This seems to be the only callers and second parameter
is always true. So why is it needed?

>  
> -		spin_lock_bh(&vvs->rx_lock);
> -		virtio_transport_inc_rx_pkt(vvs, pkt);
> -		list_add_tail(&pkt->list, &vvs->rx_queue);
> -		spin_unlock_bh(&vvs->rx_lock);
> +		if (buf) {
> +			spin_lock_bh(&vvs->rx_lock);
> +			virtio_transport_inc_rx_pkt(vvs, pkt->len);
> +			list_add_tail(&buf->list, &vvs->rx_queue);
> +			spin_unlock_bh(&vvs->rx_lock);
>  
> -		sk->sk_data_ready(sk);
> -		return err;
> +			sk->sk_data_ready(sk);
> +		}
> +
> +		break;
>  	case VIRTIO_VSOCK_OP_CREDIT_UPDATE:
>  		sk->sk_write_space(sk);
>  		break;
> -- 
> 2.20.1

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput
  2019-05-10 12:58 [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
                   ` (15 preceding siblings ...)
  2019-05-10 12:58 ` Stefano Garzarella
@ 2019-05-13  9:33 ` Jason Wang
  2019-05-13 16:49   ` Stefano Garzarella
                     ` (3 more replies)
  2019-05-13  9:33 ` Jason Wang
  17 siblings, 4 replies; 75+ messages in thread
From: Jason Wang @ 2019-05-13  9:33 UTC (permalink / raw)
  To: Stefano Garzarella, netdev
  Cc: David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Stefan Hajnoczi


On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> While I was testing this new series (v2) I discovered an huge use of memory
> and a memory leak in the virtio-vsock driver in the guest when I sent
> 1-byte packets to the guest.
>
> These issues are present since the introduction of the virtio-vsock
> driver. I added the patches 1 and 2 to fix them in this series in order
> to better track the performance trends.
>
> v1: https://patchwork.kernel.org/cover/10885431/
>
> v2:
> - Add patch 1 to limit the memory usage
> - Add patch 2 to avoid memory leak during the socket release
> - Add patch 3 to fix locking of fwd_cnt and buf_alloc
> - Patch 4: fix 'free_space' type (u32 instead of s64) [Stefan]
> - Patch 5: Avoid integer underflow of iov_len [Stefan]
> - Patch 5: Fix packet capture in order to see the exact packets that are
>             delivered. [Stefan]
> - Add patch 8 to make the RX buffer size tunable [Stefan]
>
> Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> support.
> As Micheal suggested in the v1, I booted host and guest with 'nosmap', and I
> added a column with virtio-net+vhost-net performance.
>
> A brief description of patches:
> - Patches 1+2: limit the memory usage with an extra copy and avoid memory leak
> - Patches 3+4: fix locking and reduce the number of credit update messages sent
>                 to the transmitter
> - Patches 5+6: allow the host to split packets on multiple buffers and use
>                 VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
> - Patches 7+8: increase RX buffer size to 64 KiB
>
>                      host -> guest [Gbps]
> pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
>                                                                       TCP_NODELAY
> 64         0.068     0.063    0.130    0.131    0.128         0.188     0.187
> 256        0.274     0.236    0.392    0.338    0.282         0.749     0.654
> 512        0.531     0.457    0.862    0.725    0.602         1.419     1.414
> 1K         0.954     0.827    1.591    1.598    1.548         2.599     2.640
> 2K         1.783     1.543    3.731    3.637    3.469         4.530     4.754
> 4K         3.332     3.436    7.164    7.124    6.494         7.738     7.696
> 8K         5.792     5.530   11.653   11.787   11.444        12.307    11.850
> 16K        8.405     8.462   16.372   16.855   17.562        16.936    16.954
> 32K       14.208    13.669   18.945   20.009   23.128        21.980    23.015
> 64K       21.082    18.893   20.266   20.903   30.622        27.290    27.383
> 128K      20.696    20.148   20.112   21.746   32.152        30.446    30.990
> 256K      20.801    20.589   20.725   22.685   34.721        33.151    32.745
> 512K      21.220    20.465   20.432   22.106   34.496        36.847    31.096
>
>                      guest -> host [Gbps]
> pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
>                                                                       TCP_NODELAY
> 64         0.089     0.091    0.120    0.115    0.117         0.274     0.272
> 256        0.352     0.354    0.452    0.445    0.451         1.085     1.136
> 512        0.705     0.704    0.893    0.858    0.898         2.131     1.882
> 1K         1.394     1.433    1.721    1.669    1.691         3.984     3.576
> 2K         2.818     2.874    3.316    3.249    3.303         6.719     6.359
> 4K         5.293     5.397    6.129    5.933    6.082        10.105     9.860
> 8K         8.890     9.151   10.990   10.545   10.519        15.239    14.868
> 16K       11.444    11.018   12.074   15.255   15.577        20.551    20.848
> 32K       11.229    10.875   10.857   24.401   25.227        26.294    26.380
> 64K       10.832    10.545   10.816   39.487   39.616        34.996    32.041
> 128K      10.435    10.241   10.500   39.813   40.012        38.379    35.055
> 256K      10.263     9.866    9.845   34.971   35.143        36.559    37.232
> 512K      10.224    10.060   10.092   35.469   34.627        34.963    33.401
>
> As Stefan suggested in the v1, this time I measured also the efficiency in this
> way:
>      efficiency = Mbps / (%CPU_Host + %CPU_Guest)
>
> The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> but it's provided for free from iperf3 and could be an indication.
>
>          host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
>                                                                       TCP_NODELAY
> 64          0.94      0.59     3.96     4.06     4.09          2.82      2.11
> 256         2.62      2.50     6.45     6.09     5.81          9.64      8.73
> 512         5.16      4.87    13.16    12.39    11.67         17.83     17.76
> 1K          9.16      8.85    24.98    24.97    25.01         32.57     32.04
> 2K         17.41     17.03    49.09    48.59    49.22         55.31     57.14
> 4K         32.99     33.62    90.80    90.98    91.72         91.79     91.40
> 8K         58.51     59.98   153.53   170.83   167.31        137.51    132.85
> 16K        89.32     95.29   216.98   264.18   260.95        176.05    176.05
> 32K       152.94    167.10   285.75   387.02   360.81        215.49    226.30
> 64K       250.38    307.20   317.65   489.53   472.70        238.97    244.27
> 128K      327.99    335.24   335.76   523.71   486.41        253.29    260.86
> 256K      327.06    334.24   338.64   533.76   509.85        267.78    266.22
> 512K      337.36    330.61   334.95   512.90   496.35        280.42    241.43
>
>          guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
>                                                                       TCP_NODELAY
> 64          0.90      0.91     1.37     1.32     1.35          2.15      2.13
> 256         3.59      3.55     5.23     5.19     5.29          8.50      8.89
> 512         7.19      7.08    10.21     9.95    10.38         16.74     14.71
> 1K         14.15     14.34    19.85    19.06    19.33         31.44     28.11
> 2K         28.44     29.09    37.78    37.18    37.49         53.07     50.63
> 4K         55.37     57.60    71.02    69.27    70.97         81.56     79.32
> 8K        105.58    100.45   111.95   124.68   123.61        120.85    118.66
> 16K       141.63    138.24   137.67   187.41   190.20        160.43    163.00
> 32K       147.56    143.09   138.48   296.41   301.04        214.64    223.94
> 64K       144.81    143.27   138.49   433.98   462.26        298.86    269.71
> 128K      150.14    147.99   146.85   511.36   514.29        350.17    298.09
> 256K      156.69    152.25   148.69   542.19   549.97        326.42    333.32
> 512K      157.29    153.35   152.22   546.52   533.24        315.55    302.27
>
> [1] https://github.com/stefano-garzarella/iperf/


Hi:

Do you have any explanation that vsock is better here? Is this because 
of the mergeable buffer? If you, we need test with mrg_rxbuf=off.

Thanks


>
> Stefano Garzarella (8):
>    vsock/virtio: limit the memory used per-socket
>    vsock/virtio: free packets during the socket release
>    vsock/virtio: fix locking for fwd_cnt and buf_alloc
>    vsock/virtio: reduce credit update messages
>    vhost/vsock: split packets to send using multiple buffers
>    vsock/virtio: change the maximum packet size allowed
>    vsock/virtio: increase RX buffer size to 64 KiB
>    vsock/virtio: make the RX buffer size tunable
>
>   drivers/vhost/vsock.c                   |  53 +++++++--
>   include/linux/virtio_vsock.h            |  14 ++-
>   net/vmw_vsock/virtio_transport.c        |  28 ++++-
>   net/vmw_vsock/virtio_transport_common.c | 144 ++++++++++++++++++------
>   4 files changed, 190 insertions(+), 49 deletions(-)
>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput
  2019-05-10 12:58 [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
                   ` (16 preceding siblings ...)
  2019-05-13  9:33 ` [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput Jason Wang
@ 2019-05-13  9:33 ` Jason Wang
  17 siblings, 0 replies; 75+ messages in thread
From: Jason Wang @ 2019-05-13  9:33 UTC (permalink / raw)
  To: Stefano Garzarella, netdev
  Cc: kvm, Michael S. Tsirkin, linux-kernel, virtualization,
	Stefan Hajnoczi, David S. Miller


On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> While I was testing this new series (v2) I discovered an huge use of memory
> and a memory leak in the virtio-vsock driver in the guest when I sent
> 1-byte packets to the guest.
>
> These issues are present since the introduction of the virtio-vsock
> driver. I added the patches 1 and 2 to fix them in this series in order
> to better track the performance trends.
>
> v1: https://patchwork.kernel.org/cover/10885431/
>
> v2:
> - Add patch 1 to limit the memory usage
> - Add patch 2 to avoid memory leak during the socket release
> - Add patch 3 to fix locking of fwd_cnt and buf_alloc
> - Patch 4: fix 'free_space' type (u32 instead of s64) [Stefan]
> - Patch 5: Avoid integer underflow of iov_len [Stefan]
> - Patch 5: Fix packet capture in order to see the exact packets that are
>             delivered. [Stefan]
> - Add patch 8 to make the RX buffer size tunable [Stefan]
>
> Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> support.
> As Micheal suggested in the v1, I booted host and guest with 'nosmap', and I
> added a column with virtio-net+vhost-net performance.
>
> A brief description of patches:
> - Patches 1+2: limit the memory usage with an extra copy and avoid memory leak
> - Patches 3+4: fix locking and reduce the number of credit update messages sent
>                 to the transmitter
> - Patches 5+6: allow the host to split packets on multiple buffers and use
>                 VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
> - Patches 7+8: increase RX buffer size to 64 KiB
>
>                      host -> guest [Gbps]
> pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
>                                                                       TCP_NODELAY
> 64         0.068     0.063    0.130    0.131    0.128         0.188     0.187
> 256        0.274     0.236    0.392    0.338    0.282         0.749     0.654
> 512        0.531     0.457    0.862    0.725    0.602         1.419     1.414
> 1K         0.954     0.827    1.591    1.598    1.548         2.599     2.640
> 2K         1.783     1.543    3.731    3.637    3.469         4.530     4.754
> 4K         3.332     3.436    7.164    7.124    6.494         7.738     7.696
> 8K         5.792     5.530   11.653   11.787   11.444        12.307    11.850
> 16K        8.405     8.462   16.372   16.855   17.562        16.936    16.954
> 32K       14.208    13.669   18.945   20.009   23.128        21.980    23.015
> 64K       21.082    18.893   20.266   20.903   30.622        27.290    27.383
> 128K      20.696    20.148   20.112   21.746   32.152        30.446    30.990
> 256K      20.801    20.589   20.725   22.685   34.721        33.151    32.745
> 512K      21.220    20.465   20.432   22.106   34.496        36.847    31.096
>
>                      guest -> host [Gbps]
> pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
>                                                                       TCP_NODELAY
> 64         0.089     0.091    0.120    0.115    0.117         0.274     0.272
> 256        0.352     0.354    0.452    0.445    0.451         1.085     1.136
> 512        0.705     0.704    0.893    0.858    0.898         2.131     1.882
> 1K         1.394     1.433    1.721    1.669    1.691         3.984     3.576
> 2K         2.818     2.874    3.316    3.249    3.303         6.719     6.359
> 4K         5.293     5.397    6.129    5.933    6.082        10.105     9.860
> 8K         8.890     9.151   10.990   10.545   10.519        15.239    14.868
> 16K       11.444    11.018   12.074   15.255   15.577        20.551    20.848
> 32K       11.229    10.875   10.857   24.401   25.227        26.294    26.380
> 64K       10.832    10.545   10.816   39.487   39.616        34.996    32.041
> 128K      10.435    10.241   10.500   39.813   40.012        38.379    35.055
> 256K      10.263     9.866    9.845   34.971   35.143        36.559    37.232
> 512K      10.224    10.060   10.092   35.469   34.627        34.963    33.401
>
> As Stefan suggested in the v1, this time I measured also the efficiency in this
> way:
>      efficiency = Mbps / (%CPU_Host + %CPU_Guest)
>
> The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> but it's provided for free from iperf3 and could be an indication.
>
>          host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
>                                                                       TCP_NODELAY
> 64          0.94      0.59     3.96     4.06     4.09          2.82      2.11
> 256         2.62      2.50     6.45     6.09     5.81          9.64      8.73
> 512         5.16      4.87    13.16    12.39    11.67         17.83     17.76
> 1K          9.16      8.85    24.98    24.97    25.01         32.57     32.04
> 2K         17.41     17.03    49.09    48.59    49.22         55.31     57.14
> 4K         32.99     33.62    90.80    90.98    91.72         91.79     91.40
> 8K         58.51     59.98   153.53   170.83   167.31        137.51    132.85
> 16K        89.32     95.29   216.98   264.18   260.95        176.05    176.05
> 32K       152.94    167.10   285.75   387.02   360.81        215.49    226.30
> 64K       250.38    307.20   317.65   489.53   472.70        238.97    244.27
> 128K      327.99    335.24   335.76   523.71   486.41        253.29    260.86
> 256K      327.06    334.24   338.64   533.76   509.85        267.78    266.22
> 512K      337.36    330.61   334.95   512.90   496.35        280.42    241.43
>
>          guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
>                                                                       TCP_NODELAY
> 64          0.90      0.91     1.37     1.32     1.35          2.15      2.13
> 256         3.59      3.55     5.23     5.19     5.29          8.50      8.89
> 512         7.19      7.08    10.21     9.95    10.38         16.74     14.71
> 1K         14.15     14.34    19.85    19.06    19.33         31.44     28.11
> 2K         28.44     29.09    37.78    37.18    37.49         53.07     50.63
> 4K         55.37     57.60    71.02    69.27    70.97         81.56     79.32
> 8K        105.58    100.45   111.95   124.68   123.61        120.85    118.66
> 16K       141.63    138.24   137.67   187.41   190.20        160.43    163.00
> 32K       147.56    143.09   138.48   296.41   301.04        214.64    223.94
> 64K       144.81    143.27   138.49   433.98   462.26        298.86    269.71
> 128K      150.14    147.99   146.85   511.36   514.29        350.17    298.09
> 256K      156.69    152.25   148.69   542.19   549.97        326.42    333.32
> 512K      157.29    153.35   152.22   546.52   533.24        315.55    302.27
>
> [1] https://github.com/stefano-garzarella/iperf/


Hi:

Do you have any explanation that vsock is better here? Is this because 
of the mergeable buffer? If you, we need test with mrg_rxbuf=off.

Thanks


>
> Stefano Garzarella (8):
>    vsock/virtio: limit the memory used per-socket
>    vsock/virtio: free packets during the socket release
>    vsock/virtio: fix locking for fwd_cnt and buf_alloc
>    vsock/virtio: reduce credit update messages
>    vhost/vsock: split packets to send using multiple buffers
>    vsock/virtio: change the maximum packet size allowed
>    vsock/virtio: increase RX buffer size to 64 KiB
>    vsock/virtio: make the RX buffer size tunable
>
>   drivers/vhost/vsock.c                   |  53 +++++++--
>   include/linux/virtio_vsock.h            |  14 ++-
>   net/vmw_vsock/virtio_transport.c        |  28 ++++-
>   net/vmw_vsock/virtio_transport_common.c | 144 ++++++++++++++++++------
>   4 files changed, 190 insertions(+), 49 deletions(-)
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket
  2019-05-10 12:58 ` Stefano Garzarella
  2019-05-12 16:57     ` Michael S. Tsirkin
@ 2019-05-13  9:58   ` Jason Wang
  2019-05-13 17:23     ` Stefano Garzarella
  2019-05-13 17:23     ` Stefano Garzarella
  2019-05-13  9:58   ` Jason Wang
                     ` (2 subsequent siblings)
  4 siblings, 2 replies; 75+ messages in thread
From: Jason Wang @ 2019-05-13  9:58 UTC (permalink / raw)
  To: Stefano Garzarella, netdev
  Cc: David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Stefan Hajnoczi


On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> Since virtio-vsock was introduced, the buffers filled by the host
> and pushed to the guest using the vring, are directly queued in
> a per-socket list avoiding to copy it.
> These buffers are preallocated by the guest with a fixed
> size (4 KB).
>
> The maximum amount of memory used by each socket should be
> controlled by the credit mechanism.
> The default credit available per-socket is 256 KB, but if we use
> only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> buffers, using up to 1 GB of memory per-socket. In addition, the
> guest will continue to fill the vring with new 4 KB free buffers
> to avoid starvation of other sockets.
>
> This patch solves this issue copying the payload in a new buffer.
> Then it is queued in the per-socket list, and the 4KB buffer used
> by the host is freed.
>
> In this way, the memory used by each socket respects the credit
> available, and we still avoid starvation, paying the cost of an
> extra memory copy. When the buffer is completely full we do a
> "zero-copy", moving the buffer directly in the per-socket list.


I wonder in the long run we should use generic socket accouting 
mechanism provided by kernel (e.g socket, skb, sndbuf, recvbug, 
truesize) instead of vsock specific thing to avoid duplicating efforts.


>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>   drivers/vhost/vsock.c                   |  2 +
>   include/linux/virtio_vsock.h            |  8 +++
>   net/vmw_vsock/virtio_transport.c        |  1 +
>   net/vmw_vsock/virtio_transport_common.c | 95 ++++++++++++++++++-------
>   4 files changed, 81 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index bb5fc0e9fbc2..7964e2daee09 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -320,6 +320,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
>   		return NULL;
>   	}
>   
> +	pkt->buf_len = pkt->len;
> +
>   	nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter);
>   	if (nbytes != pkt->len) {
>   		vq_err(vq, "Expected %u byte payload, got %zu bytes\n",
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index e223e2632edd..345f04ee9193 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -54,9 +54,17 @@ struct virtio_vsock_pkt {
>   	void *buf;
>   	u32 len;
>   	u32 off;
> +	u32 buf_len;
>   	bool reply;
>   };
>   
> +struct virtio_vsock_buf {
> +	struct list_head list;
> +	void *addr;
> +	u32 len;
> +	u32 off;
> +};
> +
>   struct virtio_vsock_pkt_info {
>   	u32 remote_cid, remote_port;
>   	struct vsock_sock *vsk;
> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> index 15eb5d3d4750..af1d2ce12f54 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -280,6 +280,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
>   			break;
>   		}
>   
> +		pkt->buf_len = buf_len;
>   		pkt->len = buf_len;
>   
>   		sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 602715fc9a75..0248d6808755 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -65,6 +65,9 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
>   		pkt->buf = kmalloc(len, GFP_KERNEL);
>   		if (!pkt->buf)
>   			goto out_pkt;
> +
> +		pkt->buf_len = len;
> +
>   		err = memcpy_from_msg(pkt->buf, info->msg, len);
>   		if (err)
>   			goto out;
> @@ -86,6 +89,46 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
>   	return NULL;
>   }
>   
> +static struct virtio_vsock_buf *
> +virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
> +{
> +	struct virtio_vsock_buf *buf;
> +
> +	if (pkt->len == 0)
> +		return NULL;
> +
> +	buf = kzalloc(sizeof(*buf), GFP_KERNEL);
> +	if (!buf)
> +		return NULL;
> +
> +	/* If the buffer in the virtio_vsock_pkt is full, we can move it to
> +	 * the new virtio_vsock_buf avoiding the copy, because we are sure that
> +	 * we are not use more memory than that counted by the credit mechanism.
> +	 */
> +	if (zero_copy && pkt->len == pkt->buf_len) {
> +		buf->addr = pkt->buf;
> +		pkt->buf = NULL;
> +	} else {


Is the copy still needed if we're just few bytes less? We meet similar 
issue for virito-net, and virtio-net solve this by always copy first 
128bytes for big packets.

See receive_big().

Thanks


> +		buf->addr = kmalloc(pkt->len, GFP_KERNEL);
> +		if (!buf->addr) {
> +			kfree(buf);
> +			return NULL;
> +		}
> +
> +		memcpy(buf->addr, pkt->buf, pkt->len);
> +	}
> +
> +	buf->len = pkt->len;
> +
> +	return buf;
> +}
> +
> +static void virtio_transport_free_buf(struct virtio_vsock_buf *buf)
> +{
> +	kfree(buf->addr);
> +	kfree(buf);
> +}
> +
>   /* Packet capture */
>   static struct sk_buff *virtio_transport_build_skb(void *opaque)
>   {
> @@ -190,17 +233,15 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>   	return virtio_transport_get_ops()->send_pkt(pkt);
>   }
>   
> -static void virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
> -					struct virtio_vsock_pkt *pkt)
> +static void virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs, u32 len)
>   {
> -	vvs->rx_bytes += pkt->len;
> +	vvs->rx_bytes += len;
>   }
>   
> -static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
> -					struct virtio_vsock_pkt *pkt)
> +static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs, u32 len)
>   {
> -	vvs->rx_bytes -= pkt->len;
> -	vvs->fwd_cnt += pkt->len;
> +	vvs->rx_bytes -= len;
> +	vvs->fwd_cnt += len;
>   }
>   
>   void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
> @@ -254,36 +295,36 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
>   				   size_t len)
>   {
>   	struct virtio_vsock_sock *vvs = vsk->trans;
> -	struct virtio_vsock_pkt *pkt;
> +	struct virtio_vsock_buf *buf;
>   	size_t bytes, total = 0;
>   	int err = -EFAULT;
>   
>   	spin_lock_bh(&vvs->rx_lock);
>   	while (total < len && !list_empty(&vvs->rx_queue)) {
> -		pkt = list_first_entry(&vvs->rx_queue,
> -				       struct virtio_vsock_pkt, list);
> +		buf = list_first_entry(&vvs->rx_queue,
> +				       struct virtio_vsock_buf, list);
>   
>   		bytes = len - total;
> -		if (bytes > pkt->len - pkt->off)
> -			bytes = pkt->len - pkt->off;
> +		if (bytes > buf->len - buf->off)
> +			bytes = buf->len - buf->off;
>   
>   		/* sk_lock is held by caller so no one else can dequeue.
>   		 * Unlock rx_lock since memcpy_to_msg() may sleep.
>   		 */
>   		spin_unlock_bh(&vvs->rx_lock);
>   
> -		err = memcpy_to_msg(msg, pkt->buf + pkt->off, bytes);
> +		err = memcpy_to_msg(msg, buf->addr + buf->off, bytes);
>   		if (err)
>   			goto out;
>   
>   		spin_lock_bh(&vvs->rx_lock);
>   
>   		total += bytes;
> -		pkt->off += bytes;
> -		if (pkt->off == pkt->len) {
> -			virtio_transport_dec_rx_pkt(vvs, pkt);
> -			list_del(&pkt->list);
> -			virtio_transport_free_pkt(pkt);
> +		buf->off += bytes;
> +		if (buf->off == buf->len) {
> +			virtio_transport_dec_rx_pkt(vvs, buf->len);
> +			list_del(&buf->list);
> +			virtio_transport_free_buf(buf);
>   		}
>   	}
>   	spin_unlock_bh(&vvs->rx_lock);
> @@ -841,20 +882,24 @@ virtio_transport_recv_connected(struct sock *sk,
>   {
>   	struct vsock_sock *vsk = vsock_sk(sk);
>   	struct virtio_vsock_sock *vvs = vsk->trans;
> +	struct virtio_vsock_buf *buf;
>   	int err = 0;
>   
>   	switch (le16_to_cpu(pkt->hdr.op)) {
>   	case VIRTIO_VSOCK_OP_RW:
>   		pkt->len = le32_to_cpu(pkt->hdr.len);
> -		pkt->off = 0;
> +		buf = virtio_transport_alloc_buf(pkt, true);
>   
> -		spin_lock_bh(&vvs->rx_lock);
> -		virtio_transport_inc_rx_pkt(vvs, pkt);
> -		list_add_tail(&pkt->list, &vvs->rx_queue);
> -		spin_unlock_bh(&vvs->rx_lock);
> +		if (buf) {
> +			spin_lock_bh(&vvs->rx_lock);
> +			virtio_transport_inc_rx_pkt(vvs, pkt->len);
> +			list_add_tail(&buf->list, &vvs->rx_queue);
> +			spin_unlock_bh(&vvs->rx_lock);
>   
> -		sk->sk_data_ready(sk);
> -		return err;
> +			sk->sk_data_ready(sk);
> +		}
> +
> +		break;
>   	case VIRTIO_VSOCK_OP_CREDIT_UPDATE:
>   		sk->sk_write_space(sk);
>   		break;

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket
  2019-05-10 12:58 ` Stefano Garzarella
  2019-05-12 16:57     ` Michael S. Tsirkin
  2019-05-13  9:58   ` Jason Wang
@ 2019-05-13  9:58   ` Jason Wang
  2019-05-16 15:25   ` Stefan Hajnoczi
  2019-05-16 15:25   ` Stefan Hajnoczi
  4 siblings, 0 replies; 75+ messages in thread
From: Jason Wang @ 2019-05-13  9:58 UTC (permalink / raw)
  To: Stefano Garzarella, netdev
  Cc: kvm, Michael S. Tsirkin, linux-kernel, virtualization,
	Stefan Hajnoczi, David S. Miller


On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> Since virtio-vsock was introduced, the buffers filled by the host
> and pushed to the guest using the vring, are directly queued in
> a per-socket list avoiding to copy it.
> These buffers are preallocated by the guest with a fixed
> size (4 KB).
>
> The maximum amount of memory used by each socket should be
> controlled by the credit mechanism.
> The default credit available per-socket is 256 KB, but if we use
> only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> buffers, using up to 1 GB of memory per-socket. In addition, the
> guest will continue to fill the vring with new 4 KB free buffers
> to avoid starvation of other sockets.
>
> This patch solves this issue copying the payload in a new buffer.
> Then it is queued in the per-socket list, and the 4KB buffer used
> by the host is freed.
>
> In this way, the memory used by each socket respects the credit
> available, and we still avoid starvation, paying the cost of an
> extra memory copy. When the buffer is completely full we do a
> "zero-copy", moving the buffer directly in the per-socket list.


I wonder in the long run we should use generic socket accouting 
mechanism provided by kernel (e.g socket, skb, sndbuf, recvbug, 
truesize) instead of vsock specific thing to avoid duplicating efforts.


>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>   drivers/vhost/vsock.c                   |  2 +
>   include/linux/virtio_vsock.h            |  8 +++
>   net/vmw_vsock/virtio_transport.c        |  1 +
>   net/vmw_vsock/virtio_transport_common.c | 95 ++++++++++++++++++-------
>   4 files changed, 81 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index bb5fc0e9fbc2..7964e2daee09 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -320,6 +320,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
>   		return NULL;
>   	}
>   
> +	pkt->buf_len = pkt->len;
> +
>   	nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter);
>   	if (nbytes != pkt->len) {
>   		vq_err(vq, "Expected %u byte payload, got %zu bytes\n",
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index e223e2632edd..345f04ee9193 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -54,9 +54,17 @@ struct virtio_vsock_pkt {
>   	void *buf;
>   	u32 len;
>   	u32 off;
> +	u32 buf_len;
>   	bool reply;
>   };
>   
> +struct virtio_vsock_buf {
> +	struct list_head list;
> +	void *addr;
> +	u32 len;
> +	u32 off;
> +};
> +
>   struct virtio_vsock_pkt_info {
>   	u32 remote_cid, remote_port;
>   	struct vsock_sock *vsk;
> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> index 15eb5d3d4750..af1d2ce12f54 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -280,6 +280,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
>   			break;
>   		}
>   
> +		pkt->buf_len = buf_len;
>   		pkt->len = buf_len;
>   
>   		sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 602715fc9a75..0248d6808755 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -65,6 +65,9 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
>   		pkt->buf = kmalloc(len, GFP_KERNEL);
>   		if (!pkt->buf)
>   			goto out_pkt;
> +
> +		pkt->buf_len = len;
> +
>   		err = memcpy_from_msg(pkt->buf, info->msg, len);
>   		if (err)
>   			goto out;
> @@ -86,6 +89,46 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
>   	return NULL;
>   }
>   
> +static struct virtio_vsock_buf *
> +virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
> +{
> +	struct virtio_vsock_buf *buf;
> +
> +	if (pkt->len == 0)
> +		return NULL;
> +
> +	buf = kzalloc(sizeof(*buf), GFP_KERNEL);
> +	if (!buf)
> +		return NULL;
> +
> +	/* If the buffer in the virtio_vsock_pkt is full, we can move it to
> +	 * the new virtio_vsock_buf avoiding the copy, because we are sure that
> +	 * we are not use more memory than that counted by the credit mechanism.
> +	 */
> +	if (zero_copy && pkt->len == pkt->buf_len) {
> +		buf->addr = pkt->buf;
> +		pkt->buf = NULL;
> +	} else {


Is the copy still needed if we're just few bytes less? We meet similar 
issue for virito-net, and virtio-net solve this by always copy first 
128bytes for big packets.

See receive_big().

Thanks


> +		buf->addr = kmalloc(pkt->len, GFP_KERNEL);
> +		if (!buf->addr) {
> +			kfree(buf);
> +			return NULL;
> +		}
> +
> +		memcpy(buf->addr, pkt->buf, pkt->len);
> +	}
> +
> +	buf->len = pkt->len;
> +
> +	return buf;
> +}
> +
> +static void virtio_transport_free_buf(struct virtio_vsock_buf *buf)
> +{
> +	kfree(buf->addr);
> +	kfree(buf);
> +}
> +
>   /* Packet capture */
>   static struct sk_buff *virtio_transport_build_skb(void *opaque)
>   {
> @@ -190,17 +233,15 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>   	return virtio_transport_get_ops()->send_pkt(pkt);
>   }
>   
> -static void virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
> -					struct virtio_vsock_pkt *pkt)
> +static void virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs, u32 len)
>   {
> -	vvs->rx_bytes += pkt->len;
> +	vvs->rx_bytes += len;
>   }
>   
> -static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
> -					struct virtio_vsock_pkt *pkt)
> +static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs, u32 len)
>   {
> -	vvs->rx_bytes -= pkt->len;
> -	vvs->fwd_cnt += pkt->len;
> +	vvs->rx_bytes -= len;
> +	vvs->fwd_cnt += len;
>   }
>   
>   void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
> @@ -254,36 +295,36 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
>   				   size_t len)
>   {
>   	struct virtio_vsock_sock *vvs = vsk->trans;
> -	struct virtio_vsock_pkt *pkt;
> +	struct virtio_vsock_buf *buf;
>   	size_t bytes, total = 0;
>   	int err = -EFAULT;
>   
>   	spin_lock_bh(&vvs->rx_lock);
>   	while (total < len && !list_empty(&vvs->rx_queue)) {
> -		pkt = list_first_entry(&vvs->rx_queue,
> -				       struct virtio_vsock_pkt, list);
> +		buf = list_first_entry(&vvs->rx_queue,
> +				       struct virtio_vsock_buf, list);
>   
>   		bytes = len - total;
> -		if (bytes > pkt->len - pkt->off)
> -			bytes = pkt->len - pkt->off;
> +		if (bytes > buf->len - buf->off)
> +			bytes = buf->len - buf->off;
>   
>   		/* sk_lock is held by caller so no one else can dequeue.
>   		 * Unlock rx_lock since memcpy_to_msg() may sleep.
>   		 */
>   		spin_unlock_bh(&vvs->rx_lock);
>   
> -		err = memcpy_to_msg(msg, pkt->buf + pkt->off, bytes);
> +		err = memcpy_to_msg(msg, buf->addr + buf->off, bytes);
>   		if (err)
>   			goto out;
>   
>   		spin_lock_bh(&vvs->rx_lock);
>   
>   		total += bytes;
> -		pkt->off += bytes;
> -		if (pkt->off == pkt->len) {
> -			virtio_transport_dec_rx_pkt(vvs, pkt);
> -			list_del(&pkt->list);
> -			virtio_transport_free_pkt(pkt);
> +		buf->off += bytes;
> +		if (buf->off == buf->len) {
> +			virtio_transport_dec_rx_pkt(vvs, buf->len);
> +			list_del(&buf->list);
> +			virtio_transport_free_buf(buf);
>   		}
>   	}
>   	spin_unlock_bh(&vvs->rx_lock);
> @@ -841,20 +882,24 @@ virtio_transport_recv_connected(struct sock *sk,
>   {
>   	struct vsock_sock *vsk = vsock_sk(sk);
>   	struct virtio_vsock_sock *vvs = vsk->trans;
> +	struct virtio_vsock_buf *buf;
>   	int err = 0;
>   
>   	switch (le16_to_cpu(pkt->hdr.op)) {
>   	case VIRTIO_VSOCK_OP_RW:
>   		pkt->len = le32_to_cpu(pkt->hdr.len);
> -		pkt->off = 0;
> +		buf = virtio_transport_alloc_buf(pkt, true);
>   
> -		spin_lock_bh(&vvs->rx_lock);
> -		virtio_transport_inc_rx_pkt(vvs, pkt);
> -		list_add_tail(&pkt->list, &vvs->rx_queue);
> -		spin_unlock_bh(&vvs->rx_lock);
> +		if (buf) {
> +			spin_lock_bh(&vvs->rx_lock);
> +			virtio_transport_inc_rx_pkt(vvs, pkt->len);
> +			list_add_tail(&buf->list, &vvs->rx_queue);
> +			spin_unlock_bh(&vvs->rx_lock);
>   
> -		sk->sk_data_ready(sk);
> -		return err;
> +			sk->sk_data_ready(sk);
> +		}
> +
> +		break;
>   	case VIRTIO_VSOCK_OP_CREDIT_UPDATE:
>   		sk->sk_write_space(sk);
>   		break;
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 7/8] vsock/virtio: increase RX buffer size to 64 KiB
  2019-05-10 12:58 ` [PATCH v2 7/8] vsock/virtio: increase RX buffer size to 64 KiB Stefano Garzarella
@ 2019-05-13 10:01   ` Jason Wang
  2019-05-13 17:51     ` Stefano Garzarella
  2019-05-13 17:51     ` Stefano Garzarella
  2019-05-13 10:01   ` Jason Wang
  1 sibling, 2 replies; 75+ messages in thread
From: Jason Wang @ 2019-05-13 10:01 UTC (permalink / raw)
  To: Stefano Garzarella, netdev
  Cc: David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Stefan Hajnoczi


On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> In order to increase host -> guest throughput with large packets,
> we can use 64 KiB RX buffers.
>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>   include/linux/virtio_vsock.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index 84b72026d327..5a9d25be72df 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -10,7 +10,7 @@
>   #define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE	128
>   #define VIRTIO_VSOCK_DEFAULT_BUF_SIZE		(1024 * 256)
>   #define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE	(1024 * 256)
> -#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 4)
> +#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 64)
>   #define VIRTIO_VSOCK_MAX_BUF_SIZE		0xFFFFFFFFUL
>   #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE		(1024 * 64)
>   


We probably don't want such high order allocation. It's better to switch 
to use order 0 pages in this case. See add_recvbuf_big() for virtio-net. 
If we get datapath unified, we will get more stuffs set.

Thanks


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 7/8] vsock/virtio: increase RX buffer size to 64 KiB
  2019-05-10 12:58 ` [PATCH v2 7/8] vsock/virtio: increase RX buffer size to 64 KiB Stefano Garzarella
  2019-05-13 10:01   ` Jason Wang
@ 2019-05-13 10:01   ` Jason Wang
  1 sibling, 0 replies; 75+ messages in thread
From: Jason Wang @ 2019-05-13 10:01 UTC (permalink / raw)
  To: Stefano Garzarella, netdev
  Cc: kvm, Michael S. Tsirkin, linux-kernel, virtualization,
	Stefan Hajnoczi, David S. Miller


On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> In order to increase host -> guest throughput with large packets,
> we can use 64 KiB RX buffers.
>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>   include/linux/virtio_vsock.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index 84b72026d327..5a9d25be72df 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -10,7 +10,7 @@
>   #define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE	128
>   #define VIRTIO_VSOCK_DEFAULT_BUF_SIZE		(1024 * 256)
>   #define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE	(1024 * 256)
> -#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 4)
> +#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 64)
>   #define VIRTIO_VSOCK_MAX_BUF_SIZE		0xFFFFFFFFUL
>   #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE		(1024 * 64)
>   


We probably don't want such high order allocation. It's better to switch 
to use order 0 pages in this case. See add_recvbuf_big() for virtio-net. 
If we get datapath unified, we will get more stuffs set.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 8/8] vsock/virtio: make the RX buffer size tunable
  2019-05-10 12:58 ` Stefano Garzarella
@ 2019-05-13 10:05     ` Jason Wang
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Wang @ 2019-05-13 10:05 UTC (permalink / raw)
  To: Stefano Garzarella, netdev
  Cc: David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Stefan Hajnoczi


On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> The RX buffer size determines the memory consumption of the
> vsock/virtio guest driver, so we make it tunable through
> a module parameter.
>
> The size allowed are between 4 KB and 64 KB in order to be
> compatible with old host drivers.
>
> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>


I don't see much value of doing this through kernel command line. We 
should deal with them automatically like what virtio-net did. Or even a 
module parameter is better.

Thanks


> ---
>   include/linux/virtio_vsock.h     |  1 +
>   net/vmw_vsock/virtio_transport.c | 27 ++++++++++++++++++++++++++-
>   2 files changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index 5a9d25be72df..b9f8c3d91f80 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -13,6 +13,7 @@
>   #define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 64)
>   #define VIRTIO_VSOCK_MAX_BUF_SIZE		0xFFFFFFFFUL
>   #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE		(1024 * 64)
> +#define VIRTIO_VSOCK_MIN_PKT_BUF_SIZE		(1024 * 4)
>   
>   enum {
>   	VSOCK_VQ_RX     = 0, /* for host to guest data */
> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> index af1d2ce12f54..732398b4e28f 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -66,6 +66,31 @@ struct virtio_vsock {
>   	u32 guest_cid;
>   };
>   
> +static unsigned int rx_buf_size = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> +
> +static int param_set_rx_buf_size(const char *val, const struct kernel_param *kp)
> +{
> +	unsigned int size;
> +	int ret;
> +
> +	ret = kstrtouint(val, 0, &size);
> +	if (ret)
> +		return ret;
> +
> +	if (size < VIRTIO_VSOCK_MIN_PKT_BUF_SIZE ||
> +	    size > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
> +		return -EINVAL;
> +
> +	return param_set_uint(val, kp);
> +};
> +
> +static const struct kernel_param_ops param_ops_rx_buf_size = {
> +	.set = param_set_rx_buf_size,
> +	.get = param_get_uint,
> +};
> +
> +module_param_cb(rx_buf_size, &param_ops_rx_buf_size, &rx_buf_size, 0644);
> +
>   static struct virtio_vsock *virtio_vsock_get(void)
>   {
>   	return the_virtio_vsock;
> @@ -261,7 +286,7 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
>   
>   static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
>   {
> -	int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> +	int buf_len = rx_buf_size;
>   	struct virtio_vsock_pkt *pkt;
>   	struct scatterlist hdr, buf, *sgs[2];
>   	struct virtqueue *vq;

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 8/8] vsock/virtio: make the RX buffer size tunable
@ 2019-05-13 10:05     ` Jason Wang
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Wang @ 2019-05-13 10:05 UTC (permalink / raw)
  To: Stefano Garzarella, netdev
  Cc: kvm, Michael S. Tsirkin, linux-kernel, virtualization,
	Stefan Hajnoczi, David S. Miller


On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> The RX buffer size determines the memory consumption of the
> vsock/virtio guest driver, so we make it tunable through
> a module parameter.
>
> The size allowed are between 4 KB and 64 KB in order to be
> compatible with old host drivers.
>
> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>


I don't see much value of doing this through kernel command line. We 
should deal with them automatically like what virtio-net did. Or even a 
module parameter is better.

Thanks


> ---
>   include/linux/virtio_vsock.h     |  1 +
>   net/vmw_vsock/virtio_transport.c | 27 ++++++++++++++++++++++++++-
>   2 files changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index 5a9d25be72df..b9f8c3d91f80 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -13,6 +13,7 @@
>   #define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 64)
>   #define VIRTIO_VSOCK_MAX_BUF_SIZE		0xFFFFFFFFUL
>   #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE		(1024 * 64)
> +#define VIRTIO_VSOCK_MIN_PKT_BUF_SIZE		(1024 * 4)
>   
>   enum {
>   	VSOCK_VQ_RX     = 0, /* for host to guest data */
> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> index af1d2ce12f54..732398b4e28f 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -66,6 +66,31 @@ struct virtio_vsock {
>   	u32 guest_cid;
>   };
>   
> +static unsigned int rx_buf_size = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> +
> +static int param_set_rx_buf_size(const char *val, const struct kernel_param *kp)
> +{
> +	unsigned int size;
> +	int ret;
> +
> +	ret = kstrtouint(val, 0, &size);
> +	if (ret)
> +		return ret;
> +
> +	if (size < VIRTIO_VSOCK_MIN_PKT_BUF_SIZE ||
> +	    size > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
> +		return -EINVAL;
> +
> +	return param_set_uint(val, kp);
> +};
> +
> +static const struct kernel_param_ops param_ops_rx_buf_size = {
> +	.set = param_set_rx_buf_size,
> +	.get = param_get_uint,
> +};
> +
> +module_param_cb(rx_buf_size, &param_ops_rx_buf_size, &rx_buf_size, 0644);
> +
>   static struct virtio_vsock *virtio_vsock_get(void)
>   {
>   	return the_virtio_vsock;
> @@ -261,7 +286,7 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
>   
>   static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
>   {
> -	int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> +	int buf_len = rx_buf_size;
>   	struct virtio_vsock_pkt *pkt;
>   	struct scatterlist hdr, buf, *sgs[2];
>   	struct virtqueue *vq;
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 8/8] vsock/virtio: make the RX buffer size tunable
  2019-05-13 10:05     ` Jason Wang
  (?)
  (?)
@ 2019-05-13 12:46     ` Jason Wang
  2019-05-14 16:10       ` Stefano Garzarella
  2019-05-14 16:10       ` Stefano Garzarella
  -1 siblings, 2 replies; 75+ messages in thread
From: Jason Wang @ 2019-05-13 12:46 UTC (permalink / raw)
  To: Stefano Garzarella, netdev
  Cc: David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Stefan Hajnoczi


On 2019/5/13 下午6:05, Jason Wang wrote:
>
> On 2019/5/10 下午8:58, Stefano Garzarella wrote:
>> The RX buffer size determines the memory consumption of the
>> vsock/virtio guest driver, so we make it tunable through
>> a module parameter.
>>
>> The size allowed are between 4 KB and 64 KB in order to be
>> compatible with old host drivers.
>>
>> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>
>
> I don't see much value of doing this through kernel command line. We 
> should deal with them automatically like what virtio-net did. Or even 
> a module parameter is better.
>
> Thanks


Sorry, I misread the patch. But even module parameter is something not 
flexible enough. We should deal with them transparently.

Thanks


>
>
>> ---
>>   include/linux/virtio_vsock.h     |  1 +
>>   net/vmw_vsock/virtio_transport.c | 27 ++++++++++++++++++++++++++-
>>   2 files changed, 27 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>> index 5a9d25be72df..b9f8c3d91f80 100644
>> --- a/include/linux/virtio_vsock.h
>> +++ b/include/linux/virtio_vsock.h
>> @@ -13,6 +13,7 @@
>>   #define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE    (1024 * 64)
>>   #define VIRTIO_VSOCK_MAX_BUF_SIZE        0xFFFFFFFFUL
>>   #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE        (1024 * 64)
>> +#define VIRTIO_VSOCK_MIN_PKT_BUF_SIZE        (1024 * 4)
>>     enum {
>>       VSOCK_VQ_RX     = 0, /* for host to guest data */
>> diff --git a/net/vmw_vsock/virtio_transport.c 
>> b/net/vmw_vsock/virtio_transport.c
>> index af1d2ce12f54..732398b4e28f 100644
>> --- a/net/vmw_vsock/virtio_transport.c
>> +++ b/net/vmw_vsock/virtio_transport.c
>> @@ -66,6 +66,31 @@ struct virtio_vsock {
>>       u32 guest_cid;
>>   };
>>   +static unsigned int rx_buf_size = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
>> +
>> +static int param_set_rx_buf_size(const char *val, const struct 
>> kernel_param *kp)
>> +{
>> +    unsigned int size;
>> +    int ret;
>> +
>> +    ret = kstrtouint(val, 0, &size);
>> +    if (ret)
>> +        return ret;
>> +
>> +    if (size < VIRTIO_VSOCK_MIN_PKT_BUF_SIZE ||
>> +        size > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
>> +        return -EINVAL;
>> +
>> +    return param_set_uint(val, kp);
>> +};
>> +
>> +static const struct kernel_param_ops param_ops_rx_buf_size = {
>> +    .set = param_set_rx_buf_size,
>> +    .get = param_get_uint,
>> +};
>> +
>> +module_param_cb(rx_buf_size, &param_ops_rx_buf_size, &rx_buf_size, 
>> 0644);
>> +
>>   static struct virtio_vsock *virtio_vsock_get(void)
>>   {
>>       return the_virtio_vsock;
>> @@ -261,7 +286,7 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
>>     static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
>>   {
>> -    int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
>> +    int buf_len = rx_buf_size;
>>       struct virtio_vsock_pkt *pkt;
>>       struct scatterlist hdr, buf, *sgs[2];
>>       struct virtqueue *vq;

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 8/8] vsock/virtio: make the RX buffer size tunable
  2019-05-13 10:05     ` Jason Wang
  (?)
@ 2019-05-13 12:46     ` Jason Wang
  -1 siblings, 0 replies; 75+ messages in thread
From: Jason Wang @ 2019-05-13 12:46 UTC (permalink / raw)
  To: Stefano Garzarella, netdev
  Cc: kvm, Michael S. Tsirkin, linux-kernel, virtualization,
	Stefan Hajnoczi, David S. Miller


On 2019/5/13 下午6:05, Jason Wang wrote:
>
> On 2019/5/10 下午8:58, Stefano Garzarella wrote:
>> The RX buffer size determines the memory consumption of the
>> vsock/virtio guest driver, so we make it tunable through
>> a module parameter.
>>
>> The size allowed are between 4 KB and 64 KB in order to be
>> compatible with old host drivers.
>>
>> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>
>
> I don't see much value of doing this through kernel command line. We 
> should deal with them automatically like what virtio-net did. Or even 
> a module parameter is better.
>
> Thanks


Sorry, I misread the patch. But even module parameter is something not 
flexible enough. We should deal with them transparently.

Thanks


>
>
>> ---
>>   include/linux/virtio_vsock.h     |  1 +
>>   net/vmw_vsock/virtio_transport.c | 27 ++++++++++++++++++++++++++-
>>   2 files changed, 27 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>> index 5a9d25be72df..b9f8c3d91f80 100644
>> --- a/include/linux/virtio_vsock.h
>> +++ b/include/linux/virtio_vsock.h
>> @@ -13,6 +13,7 @@
>>   #define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE    (1024 * 64)
>>   #define VIRTIO_VSOCK_MAX_BUF_SIZE        0xFFFFFFFFUL
>>   #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE        (1024 * 64)
>> +#define VIRTIO_VSOCK_MIN_PKT_BUF_SIZE        (1024 * 4)
>>     enum {
>>       VSOCK_VQ_RX     = 0, /* for host to guest data */
>> diff --git a/net/vmw_vsock/virtio_transport.c 
>> b/net/vmw_vsock/virtio_transport.c
>> index af1d2ce12f54..732398b4e28f 100644
>> --- a/net/vmw_vsock/virtio_transport.c
>> +++ b/net/vmw_vsock/virtio_transport.c
>> @@ -66,6 +66,31 @@ struct virtio_vsock {
>>       u32 guest_cid;
>>   };
>>   +static unsigned int rx_buf_size = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
>> +
>> +static int param_set_rx_buf_size(const char *val, const struct 
>> kernel_param *kp)
>> +{
>> +    unsigned int size;
>> +    int ret;
>> +
>> +    ret = kstrtouint(val, 0, &size);
>> +    if (ret)
>> +        return ret;
>> +
>> +    if (size < VIRTIO_VSOCK_MIN_PKT_BUF_SIZE ||
>> +        size > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
>> +        return -EINVAL;
>> +
>> +    return param_set_uint(val, kp);
>> +};
>> +
>> +static const struct kernel_param_ops param_ops_rx_buf_size = {
>> +    .set = param_set_rx_buf_size,
>> +    .get = param_get_uint,
>> +};
>> +
>> +module_param_cb(rx_buf_size, &param_ops_rx_buf_size, &rx_buf_size, 
>> 0644);
>> +
>>   static struct virtio_vsock *virtio_vsock_get(void)
>>   {
>>       return the_virtio_vsock;
>> @@ -261,7 +286,7 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
>>     static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
>>   {
>> -    int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
>> +    int buf_len = rx_buf_size;
>>       struct virtio_vsock_pkt *pkt;
>>       struct scatterlist hdr, buf, *sgs[2];
>>       struct virtqueue *vq;
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket
  2019-05-12 16:57     ` Michael S. Tsirkin
  (?)
  (?)
@ 2019-05-13 16:40     ` Stefano Garzarella
  -1 siblings, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-13 16:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, David S. Miller, virtualization, linux-kernel, kvm,
	Stefan Hajnoczi, Jason Wang

On Sun, May 12, 2019 at 12:57:48PM -0400, Michael S. Tsirkin wrote:
> On Fri, May 10, 2019 at 02:58:36PM +0200, Stefano Garzarella wrote:
> > Since virtio-vsock was introduced, the buffers filled by the host
> > and pushed to the guest using the vring, are directly queued in
> > a per-socket list avoiding to copy it.
> > These buffers are preallocated by the guest with a fixed
> > size (4 KB).
> > 
> > The maximum amount of memory used by each socket should be
> > controlled by the credit mechanism.
> > The default credit available per-socket is 256 KB, but if we use
> > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > buffers, using up to 1 GB of memory per-socket. In addition, the
> > guest will continue to fill the vring with new 4 KB free buffers
> > to avoid starvation of other sockets.
> > 
> > This patch solves this issue copying the payload in a new buffer.
> > Then it is queued in the per-socket list, and the 4KB buffer used
> > by the host is freed.
> > 
> > In this way, the memory used by each socket respects the credit
> > available, and we still avoid starvation, paying the cost of an
> > extra memory copy. When the buffer is completely full we do a
> > "zero-copy", moving the buffer directly in the per-socket list.
> > 
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > ---
> >  drivers/vhost/vsock.c                   |  2 +
> >  include/linux/virtio_vsock.h            |  8 +++
> >  net/vmw_vsock/virtio_transport.c        |  1 +
> >  net/vmw_vsock/virtio_transport_common.c | 95 ++++++++++++++++++-------
> >  4 files changed, 81 insertions(+), 25 deletions(-)
> > 
> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > index bb5fc0e9fbc2..7964e2daee09 100644
> > --- a/drivers/vhost/vsock.c
> > +++ b/drivers/vhost/vsock.c
> > @@ -320,6 +320,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
> >  		return NULL;
> >  	}
> >  
> > +	pkt->buf_len = pkt->len;
> > +
> >  	nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter);
> >  	if (nbytes != pkt->len) {
> >  		vq_err(vq, "Expected %u byte payload, got %zu bytes\n",
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index e223e2632edd..345f04ee9193 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -54,9 +54,17 @@ struct virtio_vsock_pkt {
> >  	void *buf;
> >  	u32 len;
> >  	u32 off;
> > +	u32 buf_len;
> >  	bool reply;
> >  };
> >  
> > +struct virtio_vsock_buf {
> > +	struct list_head list;
> > +	void *addr;
> > +	u32 len;
> > +	u32 off;
> > +};
> > +
> >  struct virtio_vsock_pkt_info {
> >  	u32 remote_cid, remote_port;
> >  	struct vsock_sock *vsk;
> > diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> > index 15eb5d3d4750..af1d2ce12f54 100644
> > --- a/net/vmw_vsock/virtio_transport.c
> > +++ b/net/vmw_vsock/virtio_transport.c
> > @@ -280,6 +280,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
> >  			break;
> >  		}
> >  
> > +		pkt->buf_len = buf_len;
> >  		pkt->len = buf_len;
> >  
> >  		sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > index 602715fc9a75..0248d6808755 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -65,6 +65,9 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
> >  		pkt->buf = kmalloc(len, GFP_KERNEL);
> >  		if (!pkt->buf)
> >  			goto out_pkt;
> > +
> > +		pkt->buf_len = len;
> > +
> >  		err = memcpy_from_msg(pkt->buf, info->msg, len);
> >  		if (err)
> >  			goto out;
> > @@ -86,6 +89,46 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
> >  	return NULL;
> >  }
> >  
> > +static struct virtio_vsock_buf *
> > +virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
> > +{
> > +	struct virtio_vsock_buf *buf;
> > +
> > +	if (pkt->len == 0)
> > +		return NULL;
> > +
> > +	buf = kzalloc(sizeof(*buf), GFP_KERNEL);
> > +	if (!buf)
> > +		return NULL;
> > +
> > +	/* If the buffer in the virtio_vsock_pkt is full, we can move it to
> > +	 * the new virtio_vsock_buf avoiding the copy, because we are sure that
> > +	 * we are not use
> 
> we do not use
> 

Oh thanks! Will fix!

> > more memory than that counted by the credit mechanism.
> > +	 */
> > +	if (zero_copy && pkt->len == pkt->buf_len) {
> > +		buf->addr = pkt->buf;
> > +		pkt->buf = NULL;
> > +	} else {
> > +		buf->addr = kmalloc(pkt->len, GFP_KERNEL);
> > +		if (!buf->addr) {
> > +			kfree(buf);
> > +			return NULL;
> > +		}
> > +
> > +		memcpy(buf->addr, pkt->buf, pkt->len);
> > +	}
> > +
> > +	buf->len = pkt->len;
> > +
> > +	return buf;
> > +}
> > +
> > +static void virtio_transport_free_buf(struct virtio_vsock_buf *buf)
> > +{
> > +	kfree(buf->addr);
> > +	kfree(buf);
> > +}
> > +
> >  /* Packet capture */
> >  static struct sk_buff *virtio_transport_build_skb(void *opaque)
> >  {
> > @@ -190,17 +233,15 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> >  	return virtio_transport_get_ops()->send_pkt(pkt);
> >  }
> >  
> > -static void virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
> > -					struct virtio_vsock_pkt *pkt)
> > +static void virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs, u32 len)
> >  {
> > -	vvs->rx_bytes += pkt->len;
> > +	vvs->rx_bytes += len;
> >  }
> >  
> > -static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
> > -					struct virtio_vsock_pkt *pkt)
> > +static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs, u32 len)
> >  {
> > -	vvs->rx_bytes -= pkt->len;
> > -	vvs->fwd_cnt += pkt->len;
> > +	vvs->rx_bytes -= len;
> > +	vvs->fwd_cnt += len;
> >  }
> >  
> >  void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
> > @@ -254,36 +295,36 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> >  				   size_t len)
> >  {
> >  	struct virtio_vsock_sock *vvs = vsk->trans;
> > -	struct virtio_vsock_pkt *pkt;
> > +	struct virtio_vsock_buf *buf;
> >  	size_t bytes, total = 0;
> >  	int err = -EFAULT;
> >  
> >  	spin_lock_bh(&vvs->rx_lock);
> >  	while (total < len && !list_empty(&vvs->rx_queue)) {
> > -		pkt = list_first_entry(&vvs->rx_queue,
> > -				       struct virtio_vsock_pkt, list);
> > +		buf = list_first_entry(&vvs->rx_queue,
> > +				       struct virtio_vsock_buf, list);
> >  
> >  		bytes = len - total;
> > -		if (bytes > pkt->len - pkt->off)
> > -			bytes = pkt->len - pkt->off;
> > +		if (bytes > buf->len - buf->off)
> > +			bytes = buf->len - buf->off;
> >  
> >  		/* sk_lock is held by caller so no one else can dequeue.
> >  		 * Unlock rx_lock since memcpy_to_msg() may sleep.
> >  		 */
> >  		spin_unlock_bh(&vvs->rx_lock);
> >  
> > -		err = memcpy_to_msg(msg, pkt->buf + pkt->off, bytes);
> > +		err = memcpy_to_msg(msg, buf->addr + buf->off, bytes);
> >  		if (err)
> >  			goto out;
> >  
> >  		spin_lock_bh(&vvs->rx_lock);
> >  
> >  		total += bytes;
> > -		pkt->off += bytes;
> > -		if (pkt->off == pkt->len) {
> > -			virtio_transport_dec_rx_pkt(vvs, pkt);
> > -			list_del(&pkt->list);
> > -			virtio_transport_free_pkt(pkt);
> > +		buf->off += bytes;
> > +		if (buf->off == buf->len) {
> > +			virtio_transport_dec_rx_pkt(vvs, buf->len);
> > +			list_del(&buf->list);
> > +			virtio_transport_free_buf(buf);
> >  		}
> >  	}
> >  	spin_unlock_bh(&vvs->rx_lock);
> > @@ -841,20 +882,24 @@ virtio_transport_recv_connected(struct sock *sk,
> >  {
> >  	struct vsock_sock *vsk = vsock_sk(sk);
> >  	struct virtio_vsock_sock *vvs = vsk->trans;
> > +	struct virtio_vsock_buf *buf;
> >  	int err = 0;
> >  
> >  	switch (le16_to_cpu(pkt->hdr.op)) {
> >  	case VIRTIO_VSOCK_OP_RW:
> >  		pkt->len = le32_to_cpu(pkt->hdr.len);
> > -		pkt->off = 0;
> > +		buf = virtio_transport_alloc_buf(pkt, true);
> 
> 
> This seems to be the only callers and second parameter
> is always true. So why is it needed?

Right. It was a leftover, I'll remove it.

> 
> >  
> > -		spin_lock_bh(&vvs->rx_lock);
> > -		virtio_transport_inc_rx_pkt(vvs, pkt);
> > -		list_add_tail(&pkt->list, &vvs->rx_queue);
> > -		spin_unlock_bh(&vvs->rx_lock);
> > +		if (buf) {
> > +			spin_lock_bh(&vvs->rx_lock);
> > +			virtio_transport_inc_rx_pkt(vvs, pkt->len);
> > +			list_add_tail(&buf->list, &vvs->rx_queue);
> > +			spin_unlock_bh(&vvs->rx_lock);
> >  
> > -		sk->sk_data_ready(sk);
> > -		return err;
> > +			sk->sk_data_ready(sk);
> > +		}
> > +
> > +		break;
> >  	case VIRTIO_VSOCK_OP_CREDIT_UPDATE:
> >  		sk->sk_write_space(sk);
> >  		break;

Thanks for the review,
Stefano

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket
  2019-05-12 16:57     ` Michael S. Tsirkin
  (?)
@ 2019-05-13 16:40     ` Stefano Garzarella
  -1 siblings, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-13 16:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, netdev, linux-kernel, virtualization, Stefan Hajnoczi,
	David S. Miller

On Sun, May 12, 2019 at 12:57:48PM -0400, Michael S. Tsirkin wrote:
> On Fri, May 10, 2019 at 02:58:36PM +0200, Stefano Garzarella wrote:
> > Since virtio-vsock was introduced, the buffers filled by the host
> > and pushed to the guest using the vring, are directly queued in
> > a per-socket list avoiding to copy it.
> > These buffers are preallocated by the guest with a fixed
> > size (4 KB).
> > 
> > The maximum amount of memory used by each socket should be
> > controlled by the credit mechanism.
> > The default credit available per-socket is 256 KB, but if we use
> > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > buffers, using up to 1 GB of memory per-socket. In addition, the
> > guest will continue to fill the vring with new 4 KB free buffers
> > to avoid starvation of other sockets.
> > 
> > This patch solves this issue copying the payload in a new buffer.
> > Then it is queued in the per-socket list, and the 4KB buffer used
> > by the host is freed.
> > 
> > In this way, the memory used by each socket respects the credit
> > available, and we still avoid starvation, paying the cost of an
> > extra memory copy. When the buffer is completely full we do a
> > "zero-copy", moving the buffer directly in the per-socket list.
> > 
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > ---
> >  drivers/vhost/vsock.c                   |  2 +
> >  include/linux/virtio_vsock.h            |  8 +++
> >  net/vmw_vsock/virtio_transport.c        |  1 +
> >  net/vmw_vsock/virtio_transport_common.c | 95 ++++++++++++++++++-------
> >  4 files changed, 81 insertions(+), 25 deletions(-)
> > 
> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > index bb5fc0e9fbc2..7964e2daee09 100644
> > --- a/drivers/vhost/vsock.c
> > +++ b/drivers/vhost/vsock.c
> > @@ -320,6 +320,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
> >  		return NULL;
> >  	}
> >  
> > +	pkt->buf_len = pkt->len;
> > +
> >  	nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter);
> >  	if (nbytes != pkt->len) {
> >  		vq_err(vq, "Expected %u byte payload, got %zu bytes\n",
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index e223e2632edd..345f04ee9193 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -54,9 +54,17 @@ struct virtio_vsock_pkt {
> >  	void *buf;
> >  	u32 len;
> >  	u32 off;
> > +	u32 buf_len;
> >  	bool reply;
> >  };
> >  
> > +struct virtio_vsock_buf {
> > +	struct list_head list;
> > +	void *addr;
> > +	u32 len;
> > +	u32 off;
> > +};
> > +
> >  struct virtio_vsock_pkt_info {
> >  	u32 remote_cid, remote_port;
> >  	struct vsock_sock *vsk;
> > diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> > index 15eb5d3d4750..af1d2ce12f54 100644
> > --- a/net/vmw_vsock/virtio_transport.c
> > +++ b/net/vmw_vsock/virtio_transport.c
> > @@ -280,6 +280,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
> >  			break;
> >  		}
> >  
> > +		pkt->buf_len = buf_len;
> >  		pkt->len = buf_len;
> >  
> >  		sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > index 602715fc9a75..0248d6808755 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -65,6 +65,9 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
> >  		pkt->buf = kmalloc(len, GFP_KERNEL);
> >  		if (!pkt->buf)
> >  			goto out_pkt;
> > +
> > +		pkt->buf_len = len;
> > +
> >  		err = memcpy_from_msg(pkt->buf, info->msg, len);
> >  		if (err)
> >  			goto out;
> > @@ -86,6 +89,46 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
> >  	return NULL;
> >  }
> >  
> > +static struct virtio_vsock_buf *
> > +virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
> > +{
> > +	struct virtio_vsock_buf *buf;
> > +
> > +	if (pkt->len == 0)
> > +		return NULL;
> > +
> > +	buf = kzalloc(sizeof(*buf), GFP_KERNEL);
> > +	if (!buf)
> > +		return NULL;
> > +
> > +	/* If the buffer in the virtio_vsock_pkt is full, we can move it to
> > +	 * the new virtio_vsock_buf avoiding the copy, because we are sure that
> > +	 * we are not use
> 
> we do not use
> 

Oh thanks! Will fix!

> > more memory than that counted by the credit mechanism.
> > +	 */
> > +	if (zero_copy && pkt->len == pkt->buf_len) {
> > +		buf->addr = pkt->buf;
> > +		pkt->buf = NULL;
> > +	} else {
> > +		buf->addr = kmalloc(pkt->len, GFP_KERNEL);
> > +		if (!buf->addr) {
> > +			kfree(buf);
> > +			return NULL;
> > +		}
> > +
> > +		memcpy(buf->addr, pkt->buf, pkt->len);
> > +	}
> > +
> > +	buf->len = pkt->len;
> > +
> > +	return buf;
> > +}
> > +
> > +static void virtio_transport_free_buf(struct virtio_vsock_buf *buf)
> > +{
> > +	kfree(buf->addr);
> > +	kfree(buf);
> > +}
> > +
> >  /* Packet capture */
> >  static struct sk_buff *virtio_transport_build_skb(void *opaque)
> >  {
> > @@ -190,17 +233,15 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> >  	return virtio_transport_get_ops()->send_pkt(pkt);
> >  }
> >  
> > -static void virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
> > -					struct virtio_vsock_pkt *pkt)
> > +static void virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs, u32 len)
> >  {
> > -	vvs->rx_bytes += pkt->len;
> > +	vvs->rx_bytes += len;
> >  }
> >  
> > -static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
> > -					struct virtio_vsock_pkt *pkt)
> > +static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs, u32 len)
> >  {
> > -	vvs->rx_bytes -= pkt->len;
> > -	vvs->fwd_cnt += pkt->len;
> > +	vvs->rx_bytes -= len;
> > +	vvs->fwd_cnt += len;
> >  }
> >  
> >  void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
> > @@ -254,36 +295,36 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> >  				   size_t len)
> >  {
> >  	struct virtio_vsock_sock *vvs = vsk->trans;
> > -	struct virtio_vsock_pkt *pkt;
> > +	struct virtio_vsock_buf *buf;
> >  	size_t bytes, total = 0;
> >  	int err = -EFAULT;
> >  
> >  	spin_lock_bh(&vvs->rx_lock);
> >  	while (total < len && !list_empty(&vvs->rx_queue)) {
> > -		pkt = list_first_entry(&vvs->rx_queue,
> > -				       struct virtio_vsock_pkt, list);
> > +		buf = list_first_entry(&vvs->rx_queue,
> > +				       struct virtio_vsock_buf, list);
> >  
> >  		bytes = len - total;
> > -		if (bytes > pkt->len - pkt->off)
> > -			bytes = pkt->len - pkt->off;
> > +		if (bytes > buf->len - buf->off)
> > +			bytes = buf->len - buf->off;
> >  
> >  		/* sk_lock is held by caller so no one else can dequeue.
> >  		 * Unlock rx_lock since memcpy_to_msg() may sleep.
> >  		 */
> >  		spin_unlock_bh(&vvs->rx_lock);
> >  
> > -		err = memcpy_to_msg(msg, pkt->buf + pkt->off, bytes);
> > +		err = memcpy_to_msg(msg, buf->addr + buf->off, bytes);
> >  		if (err)
> >  			goto out;
> >  
> >  		spin_lock_bh(&vvs->rx_lock);
> >  
> >  		total += bytes;
> > -		pkt->off += bytes;
> > -		if (pkt->off == pkt->len) {
> > -			virtio_transport_dec_rx_pkt(vvs, pkt);
> > -			list_del(&pkt->list);
> > -			virtio_transport_free_pkt(pkt);
> > +		buf->off += bytes;
> > +		if (buf->off == buf->len) {
> > +			virtio_transport_dec_rx_pkt(vvs, buf->len);
> > +			list_del(&buf->list);
> > +			virtio_transport_free_buf(buf);
> >  		}
> >  	}
> >  	spin_unlock_bh(&vvs->rx_lock);
> > @@ -841,20 +882,24 @@ virtio_transport_recv_connected(struct sock *sk,
> >  {
> >  	struct vsock_sock *vsk = vsock_sk(sk);
> >  	struct virtio_vsock_sock *vvs = vsk->trans;
> > +	struct virtio_vsock_buf *buf;
> >  	int err = 0;
> >  
> >  	switch (le16_to_cpu(pkt->hdr.op)) {
> >  	case VIRTIO_VSOCK_OP_RW:
> >  		pkt->len = le32_to_cpu(pkt->hdr.len);
> > -		pkt->off = 0;
> > +		buf = virtio_transport_alloc_buf(pkt, true);
> 
> 
> This seems to be the only callers and second parameter
> is always true. So why is it needed?

Right. It was a leftover, I'll remove it.

> 
> >  
> > -		spin_lock_bh(&vvs->rx_lock);
> > -		virtio_transport_inc_rx_pkt(vvs, pkt);
> > -		list_add_tail(&pkt->list, &vvs->rx_queue);
> > -		spin_unlock_bh(&vvs->rx_lock);
> > +		if (buf) {
> > +			spin_lock_bh(&vvs->rx_lock);
> > +			virtio_transport_inc_rx_pkt(vvs, pkt->len);
> > +			list_add_tail(&buf->list, &vvs->rx_queue);
> > +			spin_unlock_bh(&vvs->rx_lock);
> >  
> > -		sk->sk_data_ready(sk);
> > -		return err;
> > +			sk->sk_data_ready(sk);
> > +		}
> > +
> > +		break;
> >  	case VIRTIO_VSOCK_OP_CREDIT_UPDATE:
> >  		sk->sk_write_space(sk);
> >  		break;

Thanks for the review,
Stefano

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput
  2019-05-13  9:33 ` [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput Jason Wang
  2019-05-13 16:49   ` Stefano Garzarella
@ 2019-05-13 16:49   ` Stefano Garzarella
  2019-05-20 14:09   ` Stefano Garzarella
  2019-05-20 14:09   ` Stefano Garzarella
  3 siblings, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-13 16:49 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Stefan Hajnoczi

On Mon, May 13, 2019 at 05:33:40PM +0800, Jason Wang wrote:
> 
> On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> > While I was testing this new series (v2) I discovered an huge use of memory
> > and a memory leak in the virtio-vsock driver in the guest when I sent
> > 1-byte packets to the guest.
> > 
> > These issues are present since the introduction of the virtio-vsock
> > driver. I added the patches 1 and 2 to fix them in this series in order
> > to better track the performance trends.
> > 
> > v1: https://patchwork.kernel.org/cover/10885431/
> > 
> > v2:
> > - Add patch 1 to limit the memory usage
> > - Add patch 2 to avoid memory leak during the socket release
> > - Add patch 3 to fix locking of fwd_cnt and buf_alloc
> > - Patch 4: fix 'free_space' type (u32 instead of s64) [Stefan]
> > - Patch 5: Avoid integer underflow of iov_len [Stefan]
> > - Patch 5: Fix packet capture in order to see the exact packets that are
> >             delivered. [Stefan]
> > - Add patch 8 to make the RX buffer size tunable [Stefan]
> > 
> > Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> > support.
> > As Micheal suggested in the v1, I booted host and guest with 'nosmap', and I
> > added a column with virtio-net+vhost-net performance.
> > 
> > A brief description of patches:
> > - Patches 1+2: limit the memory usage with an extra copy and avoid memory leak
> > - Patches 3+4: fix locking and reduce the number of credit update messages sent
> >                 to the transmitter
> > - Patches 5+6: allow the host to split packets on multiple buffers and use
> >                 VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
> > - Patches 7+8: increase RX buffer size to 64 KiB
> > 
> >                      host -> guest [Gbps]
> > pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
> >                                                                       TCP_NODELAY
> > 64         0.068     0.063    0.130    0.131    0.128         0.188     0.187
> > 256        0.274     0.236    0.392    0.338    0.282         0.749     0.654
> > 512        0.531     0.457    0.862    0.725    0.602         1.419     1.414
> > 1K         0.954     0.827    1.591    1.598    1.548         2.599     2.640
> > 2K         1.783     1.543    3.731    3.637    3.469         4.530     4.754
> > 4K         3.332     3.436    7.164    7.124    6.494         7.738     7.696
> > 8K         5.792     5.530   11.653   11.787   11.444        12.307    11.850
> > 16K        8.405     8.462   16.372   16.855   17.562        16.936    16.954
> > 32K       14.208    13.669   18.945   20.009   23.128        21.980    23.015
> > 64K       21.082    18.893   20.266   20.903   30.622        27.290    27.383
> > 128K      20.696    20.148   20.112   21.746   32.152        30.446    30.990
> > 256K      20.801    20.589   20.725   22.685   34.721        33.151    32.745
> > 512K      21.220    20.465   20.432   22.106   34.496        36.847    31.096
> > 
> >                      guest -> host [Gbps]
> > pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
> >                                                                       TCP_NODELAY
> > 64         0.089     0.091    0.120    0.115    0.117         0.274     0.272
> > 256        0.352     0.354    0.452    0.445    0.451         1.085     1.136
> > 512        0.705     0.704    0.893    0.858    0.898         2.131     1.882
> > 1K         1.394     1.433    1.721    1.669    1.691         3.984     3.576
> > 2K         2.818     2.874    3.316    3.249    3.303         6.719     6.359
> > 4K         5.293     5.397    6.129    5.933    6.082        10.105     9.860
> > 8K         8.890     9.151   10.990   10.545   10.519        15.239    14.868
> > 16K       11.444    11.018   12.074   15.255   15.577        20.551    20.848
> > 32K       11.229    10.875   10.857   24.401   25.227        26.294    26.380
> > 64K       10.832    10.545   10.816   39.487   39.616        34.996    32.041
> > 128K      10.435    10.241   10.500   39.813   40.012        38.379    35.055
> > 256K      10.263     9.866    9.845   34.971   35.143        36.559    37.232
> > 512K      10.224    10.060   10.092   35.469   34.627        34.963    33.401
> > 
> > As Stefan suggested in the v1, this time I measured also the efficiency in this
> > way:
> >      efficiency = Mbps / (%CPU_Host + %CPU_Guest)
> > 
> > The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> > but it's provided for free from iperf3 and could be an indication.
> > 
> >          host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> > pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
> >                                                                       TCP_NODELAY
> > 64          0.94      0.59     3.96     4.06     4.09          2.82      2.11
> > 256         2.62      2.50     6.45     6.09     5.81          9.64      8.73
> > 512         5.16      4.87    13.16    12.39    11.67         17.83     17.76
> > 1K          9.16      8.85    24.98    24.97    25.01         32.57     32.04
> > 2K         17.41     17.03    49.09    48.59    49.22         55.31     57.14
> > 4K         32.99     33.62    90.80    90.98    91.72         91.79     91.40
> > 8K         58.51     59.98   153.53   170.83   167.31        137.51    132.85
> > 16K        89.32     95.29   216.98   264.18   260.95        176.05    176.05
> > 32K       152.94    167.10   285.75   387.02   360.81        215.49    226.30
> > 64K       250.38    307.20   317.65   489.53   472.70        238.97    244.27
> > 128K      327.99    335.24   335.76   523.71   486.41        253.29    260.86
> > 256K      327.06    334.24   338.64   533.76   509.85        267.78    266.22
> > 512K      337.36    330.61   334.95   512.90   496.35        280.42    241.43
> > 
> >          guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> > pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
> >                                                                       TCP_NODELAY
> > 64          0.90      0.91     1.37     1.32     1.35          2.15      2.13
> > 256         3.59      3.55     5.23     5.19     5.29          8.50      8.89
> > 512         7.19      7.08    10.21     9.95    10.38         16.74     14.71
> > 1K         14.15     14.34    19.85    19.06    19.33         31.44     28.11
> > 2K         28.44     29.09    37.78    37.18    37.49         53.07     50.63
> > 4K         55.37     57.60    71.02    69.27    70.97         81.56     79.32
> > 8K        105.58    100.45   111.95   124.68   123.61        120.85    118.66
> > 16K       141.63    138.24   137.67   187.41   190.20        160.43    163.00
> > 32K       147.56    143.09   138.48   296.41   301.04        214.64    223.94
> > 64K       144.81    143.27   138.49   433.98   462.26        298.86    269.71
> > 128K      150.14    147.99   146.85   511.36   514.29        350.17    298.09
> > 256K      156.69    152.25   148.69   542.19   549.97        326.42    333.32
> > 512K      157.29    153.35   152.22   546.52   533.24        315.55    302.27
> > 
> > [1] https://github.com/stefano-garzarella/iperf/
> 
> 
> Hi:
> 
> Do you have any explanation that vsock is better here? Is this because of
> the mergeable buffer? If you, we need test with mrg_rxbuf=off.

Hi Jason,

virtio-net stays faster for packets with size up tp 16K/32K, maybe, as
you suggested, could be releated to mergeable buffer.

I'll try to disable it and re-run the tests.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput
  2019-05-13  9:33 ` [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput Jason Wang
@ 2019-05-13 16:49   ` Stefano Garzarella
  2019-05-13 16:49   ` Stefano Garzarella
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-13 16:49 UTC (permalink / raw)
  To: Jason Wang
  Cc: kvm, Michael S. Tsirkin, netdev, linux-kernel, virtualization,
	Stefan Hajnoczi, David S. Miller

On Mon, May 13, 2019 at 05:33:40PM +0800, Jason Wang wrote:
> 
> On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> > While I was testing this new series (v2) I discovered an huge use of memory
> > and a memory leak in the virtio-vsock driver in the guest when I sent
> > 1-byte packets to the guest.
> > 
> > These issues are present since the introduction of the virtio-vsock
> > driver. I added the patches 1 and 2 to fix them in this series in order
> > to better track the performance trends.
> > 
> > v1: https://patchwork.kernel.org/cover/10885431/
> > 
> > v2:
> > - Add patch 1 to limit the memory usage
> > - Add patch 2 to avoid memory leak during the socket release
> > - Add patch 3 to fix locking of fwd_cnt and buf_alloc
> > - Patch 4: fix 'free_space' type (u32 instead of s64) [Stefan]
> > - Patch 5: Avoid integer underflow of iov_len [Stefan]
> > - Patch 5: Fix packet capture in order to see the exact packets that are
> >             delivered. [Stefan]
> > - Add patch 8 to make the RX buffer size tunable [Stefan]
> > 
> > Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> > support.
> > As Micheal suggested in the v1, I booted host and guest with 'nosmap', and I
> > added a column with virtio-net+vhost-net performance.
> > 
> > A brief description of patches:
> > - Patches 1+2: limit the memory usage with an extra copy and avoid memory leak
> > - Patches 3+4: fix locking and reduce the number of credit update messages sent
> >                 to the transmitter
> > - Patches 5+6: allow the host to split packets on multiple buffers and use
> >                 VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
> > - Patches 7+8: increase RX buffer size to 64 KiB
> > 
> >                      host -> guest [Gbps]
> > pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
> >                                                                       TCP_NODELAY
> > 64         0.068     0.063    0.130    0.131    0.128         0.188     0.187
> > 256        0.274     0.236    0.392    0.338    0.282         0.749     0.654
> > 512        0.531     0.457    0.862    0.725    0.602         1.419     1.414
> > 1K         0.954     0.827    1.591    1.598    1.548         2.599     2.640
> > 2K         1.783     1.543    3.731    3.637    3.469         4.530     4.754
> > 4K         3.332     3.436    7.164    7.124    6.494         7.738     7.696
> > 8K         5.792     5.530   11.653   11.787   11.444        12.307    11.850
> > 16K        8.405     8.462   16.372   16.855   17.562        16.936    16.954
> > 32K       14.208    13.669   18.945   20.009   23.128        21.980    23.015
> > 64K       21.082    18.893   20.266   20.903   30.622        27.290    27.383
> > 128K      20.696    20.148   20.112   21.746   32.152        30.446    30.990
> > 256K      20.801    20.589   20.725   22.685   34.721        33.151    32.745
> > 512K      21.220    20.465   20.432   22.106   34.496        36.847    31.096
> > 
> >                      guest -> host [Gbps]
> > pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
> >                                                                       TCP_NODELAY
> > 64         0.089     0.091    0.120    0.115    0.117         0.274     0.272
> > 256        0.352     0.354    0.452    0.445    0.451         1.085     1.136
> > 512        0.705     0.704    0.893    0.858    0.898         2.131     1.882
> > 1K         1.394     1.433    1.721    1.669    1.691         3.984     3.576
> > 2K         2.818     2.874    3.316    3.249    3.303         6.719     6.359
> > 4K         5.293     5.397    6.129    5.933    6.082        10.105     9.860
> > 8K         8.890     9.151   10.990   10.545   10.519        15.239    14.868
> > 16K       11.444    11.018   12.074   15.255   15.577        20.551    20.848
> > 32K       11.229    10.875   10.857   24.401   25.227        26.294    26.380
> > 64K       10.832    10.545   10.816   39.487   39.616        34.996    32.041
> > 128K      10.435    10.241   10.500   39.813   40.012        38.379    35.055
> > 256K      10.263     9.866    9.845   34.971   35.143        36.559    37.232
> > 512K      10.224    10.060   10.092   35.469   34.627        34.963    33.401
> > 
> > As Stefan suggested in the v1, this time I measured also the efficiency in this
> > way:
> >      efficiency = Mbps / (%CPU_Host + %CPU_Guest)
> > 
> > The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> > but it's provided for free from iperf3 and could be an indication.
> > 
> >          host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> > pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
> >                                                                       TCP_NODELAY
> > 64          0.94      0.59     3.96     4.06     4.09          2.82      2.11
> > 256         2.62      2.50     6.45     6.09     5.81          9.64      8.73
> > 512         5.16      4.87    13.16    12.39    11.67         17.83     17.76
> > 1K          9.16      8.85    24.98    24.97    25.01         32.57     32.04
> > 2K         17.41     17.03    49.09    48.59    49.22         55.31     57.14
> > 4K         32.99     33.62    90.80    90.98    91.72         91.79     91.40
> > 8K         58.51     59.98   153.53   170.83   167.31        137.51    132.85
> > 16K        89.32     95.29   216.98   264.18   260.95        176.05    176.05
> > 32K       152.94    167.10   285.75   387.02   360.81        215.49    226.30
> > 64K       250.38    307.20   317.65   489.53   472.70        238.97    244.27
> > 128K      327.99    335.24   335.76   523.71   486.41        253.29    260.86
> > 256K      327.06    334.24   338.64   533.76   509.85        267.78    266.22
> > 512K      337.36    330.61   334.95   512.90   496.35        280.42    241.43
> > 
> >          guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> > pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
> >                                                                       TCP_NODELAY
> > 64          0.90      0.91     1.37     1.32     1.35          2.15      2.13
> > 256         3.59      3.55     5.23     5.19     5.29          8.50      8.89
> > 512         7.19      7.08    10.21     9.95    10.38         16.74     14.71
> > 1K         14.15     14.34    19.85    19.06    19.33         31.44     28.11
> > 2K         28.44     29.09    37.78    37.18    37.49         53.07     50.63
> > 4K         55.37     57.60    71.02    69.27    70.97         81.56     79.32
> > 8K        105.58    100.45   111.95   124.68   123.61        120.85    118.66
> > 16K       141.63    138.24   137.67   187.41   190.20        160.43    163.00
> > 32K       147.56    143.09   138.48   296.41   301.04        214.64    223.94
> > 64K       144.81    143.27   138.49   433.98   462.26        298.86    269.71
> > 128K      150.14    147.99   146.85   511.36   514.29        350.17    298.09
> > 256K      156.69    152.25   148.69   542.19   549.97        326.42    333.32
> > 512K      157.29    153.35   152.22   546.52   533.24        315.55    302.27
> > 
> > [1] https://github.com/stefano-garzarella/iperf/
> 
> 
> Hi:
> 
> Do you have any explanation that vsock is better here? Is this because of
> the mergeable buffer? If you, we need test with mrg_rxbuf=off.

Hi Jason,

virtio-net stays faster for packets with size up tp 16K/32K, maybe, as
you suggested, could be releated to mergeable buffer.

I'll try to disable it and re-run the tests.

Thanks,
Stefano

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket
  2019-05-13  9:58   ` Jason Wang
@ 2019-05-13 17:23     ` Stefano Garzarella
  2019-05-14  3:25       ` Jason Wang
  2019-05-14  3:25       ` Jason Wang
  2019-05-13 17:23     ` Stefano Garzarella
  1 sibling, 2 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-13 17:23 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Stefan Hajnoczi

On Mon, May 13, 2019 at 05:58:53PM +0800, Jason Wang wrote:
> 
> On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> > Since virtio-vsock was introduced, the buffers filled by the host
> > and pushed to the guest using the vring, are directly queued in
> > a per-socket list avoiding to copy it.
> > These buffers are preallocated by the guest with a fixed
> > size (4 KB).
> > 
> > The maximum amount of memory used by each socket should be
> > controlled by the credit mechanism.
> > The default credit available per-socket is 256 KB, but if we use
> > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > buffers, using up to 1 GB of memory per-socket. In addition, the
> > guest will continue to fill the vring with new 4 KB free buffers
> > to avoid starvation of other sockets.
> > 
> > This patch solves this issue copying the payload in a new buffer.
> > Then it is queued in the per-socket list, and the 4KB buffer used
> > by the host is freed.
> > 
> > In this way, the memory used by each socket respects the credit
> > available, and we still avoid starvation, paying the cost of an
> > extra memory copy. When the buffer is completely full we do a
> > "zero-copy", moving the buffer directly in the per-socket list.
> 
> 
> I wonder in the long run we should use generic socket accouting mechanism
> provided by kernel (e.g socket, skb, sndbuf, recvbug, truesize) instead of
> vsock specific thing to avoid duplicating efforts.

I agree, the idea is to switch to sk_buff but this should require an huge
change. If we will use the virtio-net datapath, it will become simpler.

> 
> 
> > 
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > ---
> >   drivers/vhost/vsock.c                   |  2 +
> >   include/linux/virtio_vsock.h            |  8 +++
> >   net/vmw_vsock/virtio_transport.c        |  1 +
> >   net/vmw_vsock/virtio_transport_common.c | 95 ++++++++++++++++++-------
> >   4 files changed, 81 insertions(+), 25 deletions(-)
> > 
> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > index bb5fc0e9fbc2..7964e2daee09 100644
> > --- a/drivers/vhost/vsock.c
> > +++ b/drivers/vhost/vsock.c
> > @@ -320,6 +320,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
> >   		return NULL;
> >   	}
> > +	pkt->buf_len = pkt->len;
> > +
> >   	nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter);
> >   	if (nbytes != pkt->len) {
> >   		vq_err(vq, "Expected %u byte payload, got %zu bytes\n",
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index e223e2632edd..345f04ee9193 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -54,9 +54,17 @@ struct virtio_vsock_pkt {
> >   	void *buf;
> >   	u32 len;
> >   	u32 off;
> > +	u32 buf_len;
> >   	bool reply;
> >   };
> > +struct virtio_vsock_buf {
> > +	struct list_head list;
> > +	void *addr;
> > +	u32 len;
> > +	u32 off;
> > +};
> > +
> >   struct virtio_vsock_pkt_info {
> >   	u32 remote_cid, remote_port;
> >   	struct vsock_sock *vsk;
> > diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> > index 15eb5d3d4750..af1d2ce12f54 100644
> > --- a/net/vmw_vsock/virtio_transport.c
> > +++ b/net/vmw_vsock/virtio_transport.c
> > @@ -280,6 +280,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
> >   			break;
> >   		}
> > +		pkt->buf_len = buf_len;
> >   		pkt->len = buf_len;
> >   		sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > index 602715fc9a75..0248d6808755 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -65,6 +65,9 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
> >   		pkt->buf = kmalloc(len, GFP_KERNEL);
> >   		if (!pkt->buf)
> >   			goto out_pkt;
> > +
> > +		pkt->buf_len = len;
> > +
> >   		err = memcpy_from_msg(pkt->buf, info->msg, len);
> >   		if (err)
> >   			goto out;
> > @@ -86,6 +89,46 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
> >   	return NULL;
> >   }
> > +static struct virtio_vsock_buf *
> > +virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
> > +{
> > +	struct virtio_vsock_buf *buf;
> > +
> > +	if (pkt->len == 0)
> > +		return NULL;
> > +
> > +	buf = kzalloc(sizeof(*buf), GFP_KERNEL);
> > +	if (!buf)
> > +		return NULL;
> > +
> > +	/* If the buffer in the virtio_vsock_pkt is full, we can move it to
> > +	 * the new virtio_vsock_buf avoiding the copy, because we are sure that
> > +	 * we are not use more memory than that counted by the credit mechanism.
> > +	 */
> > +	if (zero_copy && pkt->len == pkt->buf_len) {
> > +		buf->addr = pkt->buf;
> > +		pkt->buf = NULL;
> > +	} else {
> 
> 
> Is the copy still needed if we're just few bytes less? We meet similar issue
> for virito-net, and virtio-net solve this by always copy first 128bytes for
> big packets.
> 
> See receive_big()

I'm seeing, It is more sophisticated.
IIUC, virtio-net allocates a sk_buff with 128 bytes of buffer, then copies the
first 128 bytes, then adds the buffer used to receive the packet as a frag to
the skb.

Do you suggest to implement something similar, or for now we can use my
approach and if we will merge the datapath we can reuse the virtio-net
approach?

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket
  2019-05-13  9:58   ` Jason Wang
  2019-05-13 17:23     ` Stefano Garzarella
@ 2019-05-13 17:23     ` Stefano Garzarella
  1 sibling, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-13 17:23 UTC (permalink / raw)
  To: Jason Wang
  Cc: kvm, Michael S. Tsirkin, netdev, linux-kernel, virtualization,
	Stefan Hajnoczi, David S. Miller

On Mon, May 13, 2019 at 05:58:53PM +0800, Jason Wang wrote:
> 
> On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> > Since virtio-vsock was introduced, the buffers filled by the host
> > and pushed to the guest using the vring, are directly queued in
> > a per-socket list avoiding to copy it.
> > These buffers are preallocated by the guest with a fixed
> > size (4 KB).
> > 
> > The maximum amount of memory used by each socket should be
> > controlled by the credit mechanism.
> > The default credit available per-socket is 256 KB, but if we use
> > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > buffers, using up to 1 GB of memory per-socket. In addition, the
> > guest will continue to fill the vring with new 4 KB free buffers
> > to avoid starvation of other sockets.
> > 
> > This patch solves this issue copying the payload in a new buffer.
> > Then it is queued in the per-socket list, and the 4KB buffer used
> > by the host is freed.
> > 
> > In this way, the memory used by each socket respects the credit
> > available, and we still avoid starvation, paying the cost of an
> > extra memory copy. When the buffer is completely full we do a
> > "zero-copy", moving the buffer directly in the per-socket list.
> 
> 
> I wonder in the long run we should use generic socket accouting mechanism
> provided by kernel (e.g socket, skb, sndbuf, recvbug, truesize) instead of
> vsock specific thing to avoid duplicating efforts.

I agree, the idea is to switch to sk_buff but this should require an huge
change. If we will use the virtio-net datapath, it will become simpler.

> 
> 
> > 
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > ---
> >   drivers/vhost/vsock.c                   |  2 +
> >   include/linux/virtio_vsock.h            |  8 +++
> >   net/vmw_vsock/virtio_transport.c        |  1 +
> >   net/vmw_vsock/virtio_transport_common.c | 95 ++++++++++++++++++-------
> >   4 files changed, 81 insertions(+), 25 deletions(-)
> > 
> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > index bb5fc0e9fbc2..7964e2daee09 100644
> > --- a/drivers/vhost/vsock.c
> > +++ b/drivers/vhost/vsock.c
> > @@ -320,6 +320,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
> >   		return NULL;
> >   	}
> > +	pkt->buf_len = pkt->len;
> > +
> >   	nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter);
> >   	if (nbytes != pkt->len) {
> >   		vq_err(vq, "Expected %u byte payload, got %zu bytes\n",
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index e223e2632edd..345f04ee9193 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -54,9 +54,17 @@ struct virtio_vsock_pkt {
> >   	void *buf;
> >   	u32 len;
> >   	u32 off;
> > +	u32 buf_len;
> >   	bool reply;
> >   };
> > +struct virtio_vsock_buf {
> > +	struct list_head list;
> > +	void *addr;
> > +	u32 len;
> > +	u32 off;
> > +};
> > +
> >   struct virtio_vsock_pkt_info {
> >   	u32 remote_cid, remote_port;
> >   	struct vsock_sock *vsk;
> > diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> > index 15eb5d3d4750..af1d2ce12f54 100644
> > --- a/net/vmw_vsock/virtio_transport.c
> > +++ b/net/vmw_vsock/virtio_transport.c
> > @@ -280,6 +280,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
> >   			break;
> >   		}
> > +		pkt->buf_len = buf_len;
> >   		pkt->len = buf_len;
> >   		sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > index 602715fc9a75..0248d6808755 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -65,6 +65,9 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
> >   		pkt->buf = kmalloc(len, GFP_KERNEL);
> >   		if (!pkt->buf)
> >   			goto out_pkt;
> > +
> > +		pkt->buf_len = len;
> > +
> >   		err = memcpy_from_msg(pkt->buf, info->msg, len);
> >   		if (err)
> >   			goto out;
> > @@ -86,6 +89,46 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
> >   	return NULL;
> >   }
> > +static struct virtio_vsock_buf *
> > +virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
> > +{
> > +	struct virtio_vsock_buf *buf;
> > +
> > +	if (pkt->len == 0)
> > +		return NULL;
> > +
> > +	buf = kzalloc(sizeof(*buf), GFP_KERNEL);
> > +	if (!buf)
> > +		return NULL;
> > +
> > +	/* If the buffer in the virtio_vsock_pkt is full, we can move it to
> > +	 * the new virtio_vsock_buf avoiding the copy, because we are sure that
> > +	 * we are not use more memory than that counted by the credit mechanism.
> > +	 */
> > +	if (zero_copy && pkt->len == pkt->buf_len) {
> > +		buf->addr = pkt->buf;
> > +		pkt->buf = NULL;
> > +	} else {
> 
> 
> Is the copy still needed if we're just few bytes less? We meet similar issue
> for virito-net, and virtio-net solve this by always copy first 128bytes for
> big packets.
> 
> See receive_big()

I'm seeing, It is more sophisticated.
IIUC, virtio-net allocates a sk_buff with 128 bytes of buffer, then copies the
first 128 bytes, then adds the buffer used to receive the packet as a frag to
the skb.

Do you suggest to implement something similar, or for now we can use my
approach and if we will merge the datapath we can reuse the virtio-net
approach?

Thanks,
Stefano
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 7/8] vsock/virtio: increase RX buffer size to 64 KiB
  2019-05-13 10:01   ` Jason Wang
  2019-05-13 17:51     ` Stefano Garzarella
@ 2019-05-13 17:51     ` Stefano Garzarella
  2019-05-14  3:38       ` Jason Wang
  2019-05-14  3:38       ` Jason Wang
  1 sibling, 2 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-13 17:51 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Stefan Hajnoczi

On Mon, May 13, 2019 at 06:01:52PM +0800, Jason Wang wrote:
> 
> On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> > In order to increase host -> guest throughput with large packets,
> > we can use 64 KiB RX buffers.
> > 
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > ---
> >   include/linux/virtio_vsock.h | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index 84b72026d327..5a9d25be72df 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -10,7 +10,7 @@
> >   #define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE	128
> >   #define VIRTIO_VSOCK_DEFAULT_BUF_SIZE		(1024 * 256)
> >   #define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE	(1024 * 256)
> > -#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 4)
> > +#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 64)
> >   #define VIRTIO_VSOCK_MAX_BUF_SIZE		0xFFFFFFFFUL
> >   #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE		(1024 * 64)
> 
> 
> We probably don't want such high order allocation. It's better to switch to
> use order 0 pages in this case. See add_recvbuf_big() for virtio-net. If we
> get datapath unified, we will get more stuffs set.

IIUC, you are suggesting to allocate only pages and put them in a
scatterlist, then add them to the virtqueue.

Is it correct?

The issue that I have here, is that the virtio-vsock guest driver, see
virtio_vsock_rx_fill(), allocates a struct virtio_vsock_pkt that
contains the room for the header, then allocates the buffer for the payload.
At this point it fills the scatterlist with the &virtio_vsock_pkt.hdr and the
buffer for the payload.

Changing this will require several modifications, and if we get datapath
unified, I'm not sure it's worth it.
Of course, if we leave the datapaths separated, I'd like to do that later.

What do you think?

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 7/8] vsock/virtio: increase RX buffer size to 64 KiB
  2019-05-13 10:01   ` Jason Wang
@ 2019-05-13 17:51     ` Stefano Garzarella
  2019-05-13 17:51     ` Stefano Garzarella
  1 sibling, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-13 17:51 UTC (permalink / raw)
  To: Jason Wang
  Cc: kvm, Michael S. Tsirkin, netdev, linux-kernel, virtualization,
	Stefan Hajnoczi, David S. Miller

On Mon, May 13, 2019 at 06:01:52PM +0800, Jason Wang wrote:
> 
> On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> > In order to increase host -> guest throughput with large packets,
> > we can use 64 KiB RX buffers.
> > 
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > ---
> >   include/linux/virtio_vsock.h | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index 84b72026d327..5a9d25be72df 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -10,7 +10,7 @@
> >   #define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE	128
> >   #define VIRTIO_VSOCK_DEFAULT_BUF_SIZE		(1024 * 256)
> >   #define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE	(1024 * 256)
> > -#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 4)
> > +#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 64)
> >   #define VIRTIO_VSOCK_MAX_BUF_SIZE		0xFFFFFFFFUL
> >   #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE		(1024 * 64)
> 
> 
> We probably don't want such high order allocation. It's better to switch to
> use order 0 pages in this case. See add_recvbuf_big() for virtio-net. If we
> get datapath unified, we will get more stuffs set.

IIUC, you are suggesting to allocate only pages and put them in a
scatterlist, then add them to the virtqueue.

Is it correct?

The issue that I have here, is that the virtio-vsock guest driver, see
virtio_vsock_rx_fill(), allocates a struct virtio_vsock_pkt that
contains the room for the header, then allocates the buffer for the payload.
At this point it fills the scatterlist with the &virtio_vsock_pkt.hdr and the
buffer for the payload.

Changing this will require several modifications, and if we get datapath
unified, I'm not sure it's worth it.
Of course, if we leave the datapaths separated, I'd like to do that later.

What do you think?

Thanks,
Stefano
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket
  2019-05-13 17:23     ` Stefano Garzarella
  2019-05-14  3:25       ` Jason Wang
@ 2019-05-14  3:25       ` Jason Wang
  2019-05-14  3:40         ` Jason Wang
                           ` (3 more replies)
  1 sibling, 4 replies; 75+ messages in thread
From: Jason Wang @ 2019-05-14  3:25 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Stefan Hajnoczi


On 2019/5/14 上午1:23, Stefano Garzarella wrote:
> On Mon, May 13, 2019 at 05:58:53PM +0800, Jason Wang wrote:
>> On 2019/5/10 下午8:58, Stefano Garzarella wrote:
>>> Since virtio-vsock was introduced, the buffers filled by the host
>>> and pushed to the guest using the vring, are directly queued in
>>> a per-socket list avoiding to copy it.
>>> These buffers are preallocated by the guest with a fixed
>>> size (4 KB).
>>>
>>> The maximum amount of memory used by each socket should be
>>> controlled by the credit mechanism.
>>> The default credit available per-socket is 256 KB, but if we use
>>> only 1 byte per packet, the guest can queue up to 262144 of 4 KB
>>> buffers, using up to 1 GB of memory per-socket. In addition, the
>>> guest will continue to fill the vring with new 4 KB free buffers
>>> to avoid starvation of her sockets.
>>>
>>> This patch solves this issue copying the payload in a new buffer.
>>> Then it is queued in the per-socket list, and the 4KB buffer used
>>> by the host is freed.
>>>
>>> In this way, the memory used by each socket respects the credit
>>> available, and we still avoid starvation, paying the cost of an
>>> extra memory copy. When the buffer is completely full we do a
>>> "zero-copy", moving the buffer directly in the per-socket list.
>>
>> I wonder in the long run we should use generic socket accouting mechanism
>> provided by kernel (e.g socket, skb, sndbuf, recvbug, truesize) instead of
>> vsock specific thing to avoid duplicating efforts.
> I agree, the idea is to switch to sk_buff but this should require an huge
> change. If we will use the virtio-net datapath, it will become simpler.


Yes, unix domain socket is one example that uses general skb and socket 
structure. And we probably need some kind of socket pair on host. Using 
socket can also simplify the unification with vhost-net which depends on 
the socket proto_ops to work. I admit it's a huge change probably, we 
can do it gradually.


>>
>>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>>> ---
>>>    drivers/vhost/vsock.c                   |  2 +
>>>    include/linux/virtio_vsock.h            |  8 +++
>>>    net/vmw_vsock/virtio_transport.c        |  1 +
>>>    net/vmw_vsock/virtio_transport_common.c | 95 ++++++++++++++++++-------
>>>    4 files changed, 81 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>>> index bb5fc0e9fbc2..7964e2daee09 100644
>>> --- a/drivers/vhost/vsock.c
>>> +++ b/drivers/vhost/vsock.c
>>> @@ -320,6 +320,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
>>>    		return NULL;
>>>    	}
>>> +	pkt->buf_len = pkt->len;
>>> +
>>>    	nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter);
>>>    	if (nbytes != pkt->len) {
>>>    		vq_err(vq, "Expected %u byte payload, got %zu bytes\n",
>>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>>> index e223e2632edd..345f04ee9193 100644
>>> --- a/include/linux/virtio_vsock.h
>>> +++ b/include/linux/virtio_vsock.h
>>> @@ -54,9 +54,17 @@ struct virtio_vsock_pkt {
>>>    	void *buf;
>>>    	u32 len;
>>>    	u32 off;
>>> +	u32 buf_len;
>>>    	bool reply;
>>>    };
>>> +struct virtio_vsock_buf {
>>> +	struct list_head list;
>>> +	void *addr;
>>> +	u32 len;
>>> +	u32 off;
>>> +};
>>> +
>>>    struct virtio_vsock_pkt_info {
>>>    	u32 remote_cid, remote_port;
>>>    	struct vsock_sock *vsk;
>>> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>>> index 15eb5d3d4750..af1d2ce12f54 100644
>>> --- a/net/vmw_vsock/virtio_transport.c
>>> +++ b/net/vmw_vsock/virtio_transport.c
>>> @@ -280,6 +280,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
>>>    			break;
>>>    		}
>>> +		pkt->buf_len = buf_len;
>>>    		pkt->len = buf_len;
>>>    		sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
>>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>>> index 602715fc9a75..0248d6808755 100644
>>> --- a/net/vmw_vsock/virtio_transport_common.c
>>> +++ b/net/vmw_vsock/virtio_transport_common.c
>>> @@ -65,6 +65,9 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
>>>    		pkt->buf = kmalloc(len, GFP_KERNEL);
>>>    		if (!pkt->buf)
>>>    			goto out_pkt;
>>> +
>>> +		pkt->buf_len = len;
>>> +
>>>    		err = memcpy_from_msg(pkt->buf, info->msg, len);
>>>    		if (err)
>>>    			goto out;
>>> @@ -86,6 +89,46 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
>>>    	return NULL;
>>>    }
>>> +static struct virtio_vsock_buf *
>>> +virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
>>> +{
>>> +	struct virtio_vsock_buf *buf;
>>> +
>>> +	if (pkt->len == 0)
>>> +		return NULL;
>>> +
>>> +	buf = kzalloc(sizeof(*buf), GFP_KERNEL);
>>> +	if (!buf)
>>> +		return NULL;
>>> +
>>> +	/* If the buffer in the virtio_vsock_pkt is full, we can move it to
>>> +	 * the new virtio_vsock_buf avoiding the copy, because we are sure that
>>> +	 * we are not use more memory than that counted by the credit mechanism.
>>> +	 */
>>> +	if (zero_copy && pkt->len == pkt->buf_len) {
>>> +		buf->addr = pkt->buf;
>>> +		pkt->buf = NULL;
>>> +	} else {
>>
>> Is the copy still needed if we're just few bytes less? We meet similar issue
>> for virito-net, and virtio-net solve this by always copy first 128bytes for
>> big packets.
>>
>> See receive_big()
> I'm seeing, It is more sophisticated.
> IIUC, virtio-net allocates a sk_buff with 128 bytes of buffer, then copies the
> first 128 bytes, then adds the buffer used to receive the packet as a frag to
> the skb.


Yes and the point is if the packet is smaller than 128 bytes the pages 
will be recycled.


>
> Do you suggest to implement something similar, or for now we can use my
> approach and if we will merge the datapath we can reuse the virtio-net
> approach?


I think we need a better threshold. If I understand the patch correctly, 
we will do copy unless the packet is 64K when guest is doing receiving. 
1 byte packet is indeed a problem, but we need to solve it without 
losing too much performance.

Thanks


>
> Thanks,
> Stefano

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket
  2019-05-13 17:23     ` Stefano Garzarella
@ 2019-05-14  3:25       ` Jason Wang
  2019-05-14  3:25       ` Jason Wang
  1 sibling, 0 replies; 75+ messages in thread
From: Jason Wang @ 2019-05-14  3:25 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: kvm, Michael S. Tsirkin, netdev, linux-kernel, virtualization,
	Stefan Hajnoczi, David S. Miller


On 2019/5/14 上午1:23, Stefano Garzarella wrote:
> On Mon, May 13, 2019 at 05:58:53PM +0800, Jason Wang wrote:
>> On 2019/5/10 下午8:58, Stefano Garzarella wrote:
>>> Since virtio-vsock was introduced, the buffers filled by the host
>>> and pushed to the guest using the vring, are directly queued in
>>> a per-socket list avoiding to copy it.
>>> These buffers are preallocated by the guest with a fixed
>>> size (4 KB).
>>>
>>> The maximum amount of memory used by each socket should be
>>> controlled by the credit mechanism.
>>> The default credit available per-socket is 256 KB, but if we use
>>> only 1 byte per packet, the guest can queue up to 262144 of 4 KB
>>> buffers, using up to 1 GB of memory per-socket. In addition, the
>>> guest will continue to fill the vring with new 4 KB free buffers
>>> to avoid starvation of her sockets.
>>>
>>> This patch solves this issue copying the payload in a new buffer.
>>> Then it is queued in the per-socket list, and the 4KB buffer used
>>> by the host is freed.
>>>
>>> In this way, the memory used by each socket respects the credit
>>> available, and we still avoid starvation, paying the cost of an
>>> extra memory copy. When the buffer is completely full we do a
>>> "zero-copy", moving the buffer directly in the per-socket list.
>>
>> I wonder in the long run we should use generic socket accouting mechanism
>> provided by kernel (e.g socket, skb, sndbuf, recvbug, truesize) instead of
>> vsock specific thing to avoid duplicating efforts.
> I agree, the idea is to switch to sk_buff but this should require an huge
> change. If we will use the virtio-net datapath, it will become simpler.


Yes, unix domain socket is one example that uses general skb and socket 
structure. And we probably need some kind of socket pair on host. Using 
socket can also simplify the unification with vhost-net which depends on 
the socket proto_ops to work. I admit it's a huge change probably, we 
can do it gradually.


>>
>>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>>> ---
>>>    drivers/vhost/vsock.c                   |  2 +
>>>    include/linux/virtio_vsock.h            |  8 +++
>>>    net/vmw_vsock/virtio_transport.c        |  1 +
>>>    net/vmw_vsock/virtio_transport_common.c | 95 ++++++++++++++++++-------
>>>    4 files changed, 81 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>>> index bb5fc0e9fbc2..7964e2daee09 100644
>>> --- a/drivers/vhost/vsock.c
>>> +++ b/drivers/vhost/vsock.c
>>> @@ -320,6 +320,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
>>>    		return NULL;
>>>    	}
>>> +	pkt->buf_len = pkt->len;
>>> +
>>>    	nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter);
>>>    	if (nbytes != pkt->len) {
>>>    		vq_err(vq, "Expected %u byte payload, got %zu bytes\n",
>>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>>> index e223e2632edd..345f04ee9193 100644
>>> --- a/include/linux/virtio_vsock.h
>>> +++ b/include/linux/virtio_vsock.h
>>> @@ -54,9 +54,17 @@ struct virtio_vsock_pkt {
>>>    	void *buf;
>>>    	u32 len;
>>>    	u32 off;
>>> +	u32 buf_len;
>>>    	bool reply;
>>>    };
>>> +struct virtio_vsock_buf {
>>> +	struct list_head list;
>>> +	void *addr;
>>> +	u32 len;
>>> +	u32 off;
>>> +};
>>> +
>>>    struct virtio_vsock_pkt_info {
>>>    	u32 remote_cid, remote_port;
>>>    	struct vsock_sock *vsk;
>>> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>>> index 15eb5d3d4750..af1d2ce12f54 100644
>>> --- a/net/vmw_vsock/virtio_transport.c
>>> +++ b/net/vmw_vsock/virtio_transport.c
>>> @@ -280,6 +280,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
>>>    			break;
>>>    		}
>>> +		pkt->buf_len = buf_len;
>>>    		pkt->len = buf_len;
>>>    		sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
>>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>>> index 602715fc9a75..0248d6808755 100644
>>> --- a/net/vmw_vsock/virtio_transport_common.c
>>> +++ b/net/vmw_vsock/virtio_transport_common.c
>>> @@ -65,6 +65,9 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
>>>    		pkt->buf = kmalloc(len, GFP_KERNEL);
>>>    		if (!pkt->buf)
>>>    			goto out_pkt;
>>> +
>>> +		pkt->buf_len = len;
>>> +
>>>    		err = memcpy_from_msg(pkt->buf, info->msg, len);
>>>    		if (err)
>>>    			goto out;
>>> @@ -86,6 +89,46 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
>>>    	return NULL;
>>>    }
>>> +static struct virtio_vsock_buf *
>>> +virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
>>> +{
>>> +	struct virtio_vsock_buf *buf;
>>> +
>>> +	if (pkt->len == 0)
>>> +		return NULL;
>>> +
>>> +	buf = kzalloc(sizeof(*buf), GFP_KERNEL);
>>> +	if (!buf)
>>> +		return NULL;
>>> +
>>> +	/* If the buffer in the virtio_vsock_pkt is full, we can move it to
>>> +	 * the new virtio_vsock_buf avoiding the copy, because we are sure that
>>> +	 * we are not use more memory than that counted by the credit mechanism.
>>> +	 */
>>> +	if (zero_copy && pkt->len == pkt->buf_len) {
>>> +		buf->addr = pkt->buf;
>>> +		pkt->buf = NULL;
>>> +	} else {
>>
>> Is the copy still needed if we're just few bytes less? We meet similar issue
>> for virito-net, and virtio-net solve this by always copy first 128bytes for
>> big packets.
>>
>> See receive_big()
> I'm seeing, It is more sophisticated.
> IIUC, virtio-net allocates a sk_buff with 128 bytes of buffer, then copies the
> first 128 bytes, then adds the buffer used to receive the packet as a frag to
> the skb.


Yes and the point is if the packet is smaller than 128 bytes the pages 
will be recycled.


>
> Do you suggest to implement something similar, or for now we can use my
> approach and if we will merge the datapath we can reuse the virtio-net
> approach?


I think we need a better threshold. If I understand the patch correctly, 
we will do copy unless the packet is 64K when guest is doing receiving. 
1 byte packet is indeed a problem, but we need to solve it without 
losing too much performance.

Thanks


>
> Thanks,
> Stefano
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 7/8] vsock/virtio: increase RX buffer size to 64 KiB
  2019-05-13 17:51     ` Stefano Garzarella
@ 2019-05-14  3:38       ` Jason Wang
  2019-05-14 16:20         ` Stefano Garzarella
  2019-05-14 16:20         ` Stefano Garzarella
  2019-05-14  3:38       ` Jason Wang
  1 sibling, 2 replies; 75+ messages in thread
From: Jason Wang @ 2019-05-14  3:38 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Stefan Hajnoczi


On 2019/5/14 上午1:51, Stefano Garzarella wrote:
> On Mon, May 13, 2019 at 06:01:52PM +0800, Jason Wang wrote:
>> On 2019/5/10 下午8:58, Stefano Garzarella wrote:
>>> In order to increase host -> guest throughput with large packets,
>>> we can use 64 KiB RX buffers.
>>>
>>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>>> ---
>>>    include/linux/virtio_vsock.h | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>>> index 84b72026d327..5a9d25be72df 100644
>>> --- a/include/linux/virtio_vsock.h
>>> +++ b/include/linux/virtio_vsock.h
>>> @@ -10,7 +10,7 @@
>>>    #define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE	128
>>>    #define VIRTIO_VSOCK_DEFAULT_BUF_SIZE		(1024 * 256)
>>>    #define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE	(1024 * 256)
>>> -#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 4)
>>> +#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 64)
>>>    #define VIRTIO_VSOCK_MAX_BUF_SIZE		0xFFFFFFFFUL
>>>    #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE		(1024 * 64)
>>
>> We probably don't want such high order allocation. It's better to switch to
>> use order 0 pages in this case. See add_recvbuf_big() for virtio-net. If we
>> get datapath unified, we will get more stuffs set.
> IIUC, you are suggesting to allocate only pages and put them in a
> scatterlist, then add them to the virtqueue.
>
> Is it correct?


Yes since you are using:

                 pkt->buf = kmalloc(buf_len, GFP_KERNEL);
                 if (!pkt->buf) {
                         virtio_transport_free_pkt(pkt);
                         break;
                 }

This is likely to fail when the memory is fragmented which is kind of 
fragile.


>
> The issue that I have here, is that the virtio-vsock guest driver, see
> virtio_vsock_rx_fill(), allocates a struct virtio_vsock_pkt that
> contains the room for the header, then allocates the buffer for the payload.
> At this point it fills the scatterlist with the &virtio_vsock_pkt.hdr and the
> buffer for the payload.


This part should be fine since what is needed is just adding more pages 
to sg[] and call virtuqeueu_add_sg().


>
> Changing this will require several modifications, and if we get datapath
> unified, I'm not sure it's worth it.
> Of course, if we leave the datapaths separated, I'd like to do that later.
>
> What do you think?


For the driver it self, it should not be hard. But I think you mean the 
issue of e.g virtio_vsock_pkt itself which doesn't support sg. For short 
time, maybe we can use kvec instead.

Thanks


>
> Thanks,
> Stefano

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 7/8] vsock/virtio: increase RX buffer size to 64 KiB
  2019-05-13 17:51     ` Stefano Garzarella
  2019-05-14  3:38       ` Jason Wang
@ 2019-05-14  3:38       ` Jason Wang
  1 sibling, 0 replies; 75+ messages in thread
From: Jason Wang @ 2019-05-14  3:38 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: kvm, Michael S. Tsirkin, netdev, linux-kernel, virtualization,
	Stefan Hajnoczi, David S. Miller


On 2019/5/14 上午1:51, Stefano Garzarella wrote:
> On Mon, May 13, 2019 at 06:01:52PM +0800, Jason Wang wrote:
>> On 2019/5/10 下午8:58, Stefano Garzarella wrote:
>>> In order to increase host -> guest throughput with large packets,
>>> we can use 64 KiB RX buffers.
>>>
>>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>>> ---
>>>    include/linux/virtio_vsock.h | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>>> index 84b72026d327..5a9d25be72df 100644
>>> --- a/include/linux/virtio_vsock.h
>>> +++ b/include/linux/virtio_vsock.h
>>> @@ -10,7 +10,7 @@
>>>    #define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE	128
>>>    #define VIRTIO_VSOCK_DEFAULT_BUF_SIZE		(1024 * 256)
>>>    #define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE	(1024 * 256)
>>> -#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 4)
>>> +#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 64)
>>>    #define VIRTIO_VSOCK_MAX_BUF_SIZE		0xFFFFFFFFUL
>>>    #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE		(1024 * 64)
>>
>> We probably don't want such high order allocation. It's better to switch to
>> use order 0 pages in this case. See add_recvbuf_big() for virtio-net. If we
>> get datapath unified, we will get more stuffs set.
> IIUC, you are suggesting to allocate only pages and put them in a
> scatterlist, then add them to the virtqueue.
>
> Is it correct?


Yes since you are using:

                 pkt->buf = kmalloc(buf_len, GFP_KERNEL);
                 if (!pkt->buf) {
                         virtio_transport_free_pkt(pkt);
                         break;
                 }

This is likely to fail when the memory is fragmented which is kind of 
fragile.


>
> The issue that I have here, is that the virtio-vsock guest driver, see
> virtio_vsock_rx_fill(), allocates a struct virtio_vsock_pkt that
> contains the room for the header, then allocates the buffer for the payload.
> At this point it fills the scatterlist with the &virtio_vsock_pkt.hdr and the
> buffer for the payload.


This part should be fine since what is needed is just adding more pages 
to sg[] and call virtuqeueu_add_sg().


>
> Changing this will require several modifications, and if we get datapath
> unified, I'm not sure it's worth it.
> Of course, if we leave the datapaths separated, I'd like to do that later.
>
> What do you think?


For the driver it self, it should not be hard. But I think you mean the 
issue of e.g virtio_vsock_pkt itself which doesn't support sg. For short 
time, maybe we can use kvec instead.

Thanks


>
> Thanks,
> Stefano
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket
  2019-05-14  3:25       ` Jason Wang
  2019-05-14  3:40         ` Jason Wang
@ 2019-05-14  3:40         ` Jason Wang
  2019-05-14 16:35         ` Stefano Garzarella
  2019-05-14 16:35         ` Stefano Garzarella
  3 siblings, 0 replies; 75+ messages in thread
From: Jason Wang @ 2019-05-14  3:40 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Stefan Hajnoczi


On 2019/5/14 上午11:25, Jason Wang wrote:
>
> On 2019/5/14 上午1:23, Stefano Garzarella wrote:
>> On Mon, May 13, 2019 at 05:58:53PM +0800, Jason Wang wrote:
>>> On 2019/5/10 下午8:58, Stefano Garzarella wrote:
>>>> Since virtio-vsock was introduced, the buffers filled by the host
>>>> and pushed to the guest using the vring, are directly queued in
>>>> a per-socket list avoiding to copy it.
>>>> These buffers are preallocated by the guest with a fixed
>>>> size (4 KB).
>>>>
>>>> The maximum amount of memory used by each socket should be
>>>> controlled by the credit mechanism.
>>>> The default credit available per-socket is 256 KB, but if we use
>>>> only 1 byte per packet, the guest can queue up to 262144 of 4 KB
>>>> buffers, using up to 1 GB of memory per-socket. In addition, the
>>>> guest will continue to fill the vring with new 4 KB free buffers
>>>> to avoid starvation of her sockets.
>>>>
>>>> This patch solves this issue copying the payload in a new buffer.
>>>> Then it is queued in the per-socket list, and the 4KB buffer used
>>>> by the host is freed.
>>>>
>>>> In this way, the memory used by each socket respects the credit
>>>> available, and we still avoid starvation, paying the cost of an
>>>> extra memory copy. When the buffer is completely full we do a
>>>> "zero-copy", moving the buffer directly in the per-socket list.
>>>
>>> I wonder in the long run we should use generic socket accouting 
>>> mechanism
>>> provided by kernel (e.g socket, skb, sndbuf, recvbug, truesize) 
>>> instead of
>>> vsock specific thing to avoid duplicating efforts.
>> I agree, the idea is to switch to sk_buff but this should require an 
>> huge
>> change. If we will use the virtio-net datapath, it will become simpler.
>
>
> Yes, unix domain socket is one example that uses general skb and 
> socket structure. And we probably need some kind of socket pair on 
> host. Using socket can also simplify the unification with vhost-net 
> which depends on the socket proto_ops to work. I admit it's a huge 
> change probably, we can do it gradually.
>
>
>>>
>>>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>>>> ---
>>>>    drivers/vhost/vsock.c                   |  2 +
>>>>    include/linux/virtio_vsock.h            |  8 +++
>>>>    net/vmw_vsock/virtio_transport.c        |  1 +
>>>>    net/vmw_vsock/virtio_transport_common.c | 95 
>>>> ++++++++++++++++++-------
>>>>    4 files changed, 81 insertions(+), 25 deletions(-)
>>>>
>>>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>>>> index bb5fc0e9fbc2..7964e2daee09 100644
>>>> --- a/drivers/vhost/vsock.c
>>>> +++ b/drivers/vhost/vsock.c
>>>> @@ -320,6 +320,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
>>>>            return NULL;
>>>>        }
>>>> +    pkt->buf_len = pkt->len;
>>>> +
>>>>        nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter);
>>>>        if (nbytes != pkt->len) {
>>>>            vq_err(vq, "Expected %u byte payload, got %zu bytes\n",
>>>> diff --git a/include/linux/virtio_vsock.h 
>>>> b/include/linux/virtio_vsock.h
>>>> index e223e2632edd..345f04ee9193 100644
>>>> --- a/include/linux/virtio_vsock.h
>>>> +++ b/include/linux/virtio_vsock.h
>>>> @@ -54,9 +54,17 @@ struct virtio_vsock_pkt {
>>>>        void *buf;
>>>>        u32 len;
>>>>        u32 off;
>>>> +    u32 buf_len;
>>>>        bool reply;
>>>>    };
>>>> +struct virtio_vsock_buf {
>>>> +    struct list_head list;
>>>> +    void *addr;
>>>> +    u32 len;
>>>> +    u32 off;
>>>> +};
>>>> +
>>>>    struct virtio_vsock_pkt_info {
>>>>        u32 remote_cid, remote_port;
>>>>        struct vsock_sock *vsk;
>>>> diff --git a/net/vmw_vsock/virtio_transport.c 
>>>> b/net/vmw_vsock/virtio_transport.c
>>>> index 15eb5d3d4750..af1d2ce12f54 100644
>>>> --- a/net/vmw_vsock/virtio_transport.c
>>>> +++ b/net/vmw_vsock/virtio_transport.c
>>>> @@ -280,6 +280,7 @@ static void virtio_vsock_rx_fill(struct 
>>>> virtio_vsock *vsock)
>>>>                break;
>>>>            }
>>>> +        pkt->buf_len = buf_len;
>>>>            pkt->len = buf_len;
>>>>            sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
>>>> diff --git a/net/vmw_vsock/virtio_transport_common.c 
>>>> b/net/vmw_vsock/virtio_transport_common.c
>>>> index 602715fc9a75..0248d6808755 100644
>>>> --- a/net/vmw_vsock/virtio_transport_common.c
>>>> +++ b/net/vmw_vsock/virtio_transport_common.c
>>>> @@ -65,6 +65,9 @@ virtio_transport_alloc_pkt(struct 
>>>> virtio_vsock_pkt_info *info,
>>>>            pkt->buf = kmalloc(len, GFP_KERNEL);
>>>>            if (!pkt->buf)
>>>>                goto out_pkt;
>>>> +
>>>> +        pkt->buf_len = len;
>>>> +
>>>>            err = memcpy_from_msg(pkt->buf, info->msg, len);
>>>>            if (err)
>>>>                goto out;
>>>> @@ -86,6 +89,46 @@ virtio_transport_alloc_pkt(struct 
>>>> virtio_vsock_pkt_info *info,
>>>>        return NULL;
>>>>    }
>>>> +static struct virtio_vsock_buf *
>>>> +virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool 
>>>> zero_copy)
>>>> +{
>>>> +    struct virtio_vsock_buf *buf;
>>>> +
>>>> +    if (pkt->len == 0)
>>>> +        return NULL;
>>>> +
>>>> +    buf = kzalloc(sizeof(*buf), GFP_KERNEL);
>>>> +    if (!buf)
>>>> +        return NULL;
>>>> +
>>>> +    /* If the buffer in the virtio_vsock_pkt is full, we can move 
>>>> it to
>>>> +     * the new virtio_vsock_buf avoiding the copy, because we are 
>>>> sure that
>>>> +     * we are not use more memory than that counted by the credit 
>>>> mechanism.
>>>> +     */
>>>> +    if (zero_copy && pkt->len == pkt->buf_len) {
>>>> +        buf->addr = pkt->buf;
>>>> +        pkt->buf = NULL;
>>>> +    } else {
>>>
>>> Is the copy still needed if we're just few bytes less? We meet 
>>> similar issue
>>> for virito-net, and virtio-net solve this by always copy first 
>>> 128bytes for
>>> big packets.
>>>
>>> See receive_big()
>> I'm seeing, It is more sophisticated.
>> IIUC, virtio-net allocates a sk_buff with 128 bytes of buffer, then 
>> copies the
>> first 128 bytes, then adds the buffer used to receive the packet as a 
>> frag to
>> the skb.
>
>
> Yes and the point is if the packet is smaller than 128 bytes the pages 
> will be recycled. 


To be clear, this only work if you use order 0 page instead of a large 
buffer that is allocated through kmalloc(). Another requirement for 
order 0 page.

Thanks



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket
  2019-05-14  3:25       ` Jason Wang
@ 2019-05-14  3:40         ` Jason Wang
  2019-05-14  3:40         ` Jason Wang
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 75+ messages in thread
From: Jason Wang @ 2019-05-14  3:40 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: kvm, Michael S. Tsirkin, netdev, linux-kernel, virtualization,
	Stefan Hajnoczi, David S. Miller


On 2019/5/14 上午11:25, Jason Wang wrote:
>
> On 2019/5/14 上午1:23, Stefano Garzarella wrote:
>> On Mon, May 13, 2019 at 05:58:53PM +0800, Jason Wang wrote:
>>> On 2019/5/10 下午8:58, Stefano Garzarella wrote:
>>>> Since virtio-vsock was introduced, the buffers filled by the host
>>>> and pushed to the guest using the vring, are directly queued in
>>>> a per-socket list avoiding to copy it.
>>>> These buffers are preallocated by the guest with a fixed
>>>> size (4 KB).
>>>>
>>>> The maximum amount of memory used by each socket should be
>>>> controlled by the credit mechanism.
>>>> The default credit available per-socket is 256 KB, but if we use
>>>> only 1 byte per packet, the guest can queue up to 262144 of 4 KB
>>>> buffers, using up to 1 GB of memory per-socket. In addition, the
>>>> guest will continue to fill the vring with new 4 KB free buffers
>>>> to avoid starvation of her sockets.
>>>>
>>>> This patch solves this issue copying the payload in a new buffer.
>>>> Then it is queued in the per-socket list, and the 4KB buffer used
>>>> by the host is freed.
>>>>
>>>> In this way, the memory used by each socket respects the credit
>>>> available, and we still avoid starvation, paying the cost of an
>>>> extra memory copy. When the buffer is completely full we do a
>>>> "zero-copy", moving the buffer directly in the per-socket list.
>>>
>>> I wonder in the long run we should use generic socket accouting 
>>> mechanism
>>> provided by kernel (e.g socket, skb, sndbuf, recvbug, truesize) 
>>> instead of
>>> vsock specific thing to avoid duplicating efforts.
>> I agree, the idea is to switch to sk_buff but this should require an 
>> huge
>> change. If we will use the virtio-net datapath, it will become simpler.
>
>
> Yes, unix domain socket is one example that uses general skb and 
> socket structure. And we probably need some kind of socket pair on 
> host. Using socket can also simplify the unification with vhost-net 
> which depends on the socket proto_ops to work. I admit it's a huge 
> change probably, we can do it gradually.
>
>
>>>
>>>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>>>> ---
>>>>    drivers/vhost/vsock.c                   |  2 +
>>>>    include/linux/virtio_vsock.h            |  8 +++
>>>>    net/vmw_vsock/virtio_transport.c        |  1 +
>>>>    net/vmw_vsock/virtio_transport_common.c | 95 
>>>> ++++++++++++++++++-------
>>>>    4 files changed, 81 insertions(+), 25 deletions(-)
>>>>
>>>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>>>> index bb5fc0e9fbc2..7964e2daee09 100644
>>>> --- a/drivers/vhost/vsock.c
>>>> +++ b/drivers/vhost/vsock.c
>>>> @@ -320,6 +320,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
>>>>            return NULL;
>>>>        }
>>>> +    pkt->buf_len = pkt->len;
>>>> +
>>>>        nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter);
>>>>        if (nbytes != pkt->len) {
>>>>            vq_err(vq, "Expected %u byte payload, got %zu bytes\n",
>>>> diff --git a/include/linux/virtio_vsock.h 
>>>> b/include/linux/virtio_vsock.h
>>>> index e223e2632edd..345f04ee9193 100644
>>>> --- a/include/linux/virtio_vsock.h
>>>> +++ b/include/linux/virtio_vsock.h
>>>> @@ -54,9 +54,17 @@ struct virtio_vsock_pkt {
>>>>        void *buf;
>>>>        u32 len;
>>>>        u32 off;
>>>> +    u32 buf_len;
>>>>        bool reply;
>>>>    };
>>>> +struct virtio_vsock_buf {
>>>> +    struct list_head list;
>>>> +    void *addr;
>>>> +    u32 len;
>>>> +    u32 off;
>>>> +};
>>>> +
>>>>    struct virtio_vsock_pkt_info {
>>>>        u32 remote_cid, remote_port;
>>>>        struct vsock_sock *vsk;
>>>> diff --git a/net/vmw_vsock/virtio_transport.c 
>>>> b/net/vmw_vsock/virtio_transport.c
>>>> index 15eb5d3d4750..af1d2ce12f54 100644
>>>> --- a/net/vmw_vsock/virtio_transport.c
>>>> +++ b/net/vmw_vsock/virtio_transport.c
>>>> @@ -280,6 +280,7 @@ static void virtio_vsock_rx_fill(struct 
>>>> virtio_vsock *vsock)
>>>>                break;
>>>>            }
>>>> +        pkt->buf_len = buf_len;
>>>>            pkt->len = buf_len;
>>>>            sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
>>>> diff --git a/net/vmw_vsock/virtio_transport_common.c 
>>>> b/net/vmw_vsock/virtio_transport_common.c
>>>> index 602715fc9a75..0248d6808755 100644
>>>> --- a/net/vmw_vsock/virtio_transport_common.c
>>>> +++ b/net/vmw_vsock/virtio_transport_common.c
>>>> @@ -65,6 +65,9 @@ virtio_transport_alloc_pkt(struct 
>>>> virtio_vsock_pkt_info *info,
>>>>            pkt->buf = kmalloc(len, GFP_KERNEL);
>>>>            if (!pkt->buf)
>>>>                goto out_pkt;
>>>> +
>>>> +        pkt->buf_len = len;
>>>> +
>>>>            err = memcpy_from_msg(pkt->buf, info->msg, len);
>>>>            if (err)
>>>>                goto out;
>>>> @@ -86,6 +89,46 @@ virtio_transport_alloc_pkt(struct 
>>>> virtio_vsock_pkt_info *info,
>>>>        return NULL;
>>>>    }
>>>> +static struct virtio_vsock_buf *
>>>> +virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool 
>>>> zero_copy)
>>>> +{
>>>> +    struct virtio_vsock_buf *buf;
>>>> +
>>>> +    if (pkt->len == 0)
>>>> +        return NULL;
>>>> +
>>>> +    buf = kzalloc(sizeof(*buf), GFP_KERNEL);
>>>> +    if (!buf)
>>>> +        return NULL;
>>>> +
>>>> +    /* If the buffer in the virtio_vsock_pkt is full, we can move 
>>>> it to
>>>> +     * the new virtio_vsock_buf avoiding the copy, because we are 
>>>> sure that
>>>> +     * we are not use more memory than that counted by the credit 
>>>> mechanism.
>>>> +     */
>>>> +    if (zero_copy && pkt->len == pkt->buf_len) {
>>>> +        buf->addr = pkt->buf;
>>>> +        pkt->buf = NULL;
>>>> +    } else {
>>>
>>> Is the copy still needed if we're just few bytes less? We meet 
>>> similar issue
>>> for virito-net, and virtio-net solve this by always copy first 
>>> 128bytes for
>>> big packets.
>>>
>>> See receive_big()
>> I'm seeing, It is more sophisticated.
>> IIUC, virtio-net allocates a sk_buff with 128 bytes of buffer, then 
>> copies the
>> first 128 bytes, then adds the buffer used to receive the packet as a 
>> frag to
>> the skb.
>
>
> Yes and the point is if the packet is smaller than 128 bytes the pages 
> will be recycled. 


To be clear, this only work if you use order 0 page instead of a large 
buffer that is allocated through kmalloc(). Another requirement for 
order 0 page.

Thanks


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 8/8] vsock/virtio: make the RX buffer size tunable
  2019-05-13 12:46     ` Jason Wang
@ 2019-05-14 16:10       ` Stefano Garzarella
  2019-05-14 16:10       ` Stefano Garzarella
  1 sibling, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-14 16:10 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Stefan Hajnoczi

On Mon, May 13, 2019 at 08:46:19PM +0800, Jason Wang wrote:
> 
> On 2019/5/13 下午6:05, Jason Wang wrote:
> > 
> > On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> > > The RX buffer size determines the memory consumption of the
> > > vsock/virtio guest driver, so we make it tunable through
> > > a module parameter.
> > > 
> > > The size allowed are between 4 KB and 64 KB in order to be
> > > compatible with old host drivers.
> > > 
> > > Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > 
> > 
> > I don't see much value of doing this through kernel command line. We
> > should deal with them automatically like what virtio-net did. Or even a
> > module parameter is better.
> > 
> > Thanks
> 
> 
> Sorry, I misread the patch. But even module parameter is something not
> flexible enough. We should deal with them transparently.
> 

Okay, I'll try to understand how we can automatically adapt the RX
buffer size. Since the flow is stream based, the receiver doesn't know the
original packet size.

Maybe I can reuse the EWMA approach to understand if the buffers are
entirely filled or not.
In that case I can increase (e.g. double) or decrease the size.

I'll try to do it!

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 8/8] vsock/virtio: make the RX buffer size tunable
  2019-05-13 12:46     ` Jason Wang
  2019-05-14 16:10       ` Stefano Garzarella
@ 2019-05-14 16:10       ` Stefano Garzarella
  1 sibling, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-14 16:10 UTC (permalink / raw)
  To: Jason Wang
  Cc: kvm, Michael S. Tsirkin, netdev, linux-kernel, virtualization,
	Stefan Hajnoczi, David S. Miller

On Mon, May 13, 2019 at 08:46:19PM +0800, Jason Wang wrote:
> 
> On 2019/5/13 下午6:05, Jason Wang wrote:
> > 
> > On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> > > The RX buffer size determines the memory consumption of the
> > > vsock/virtio guest driver, so we make it tunable through
> > > a module parameter.
> > > 
> > > The size allowed are between 4 KB and 64 KB in order to be
> > > compatible with old host drivers.
> > > 
> > > Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > 
> > 
> > I don't see much value of doing this through kernel command line. We
> > should deal with them automatically like what virtio-net did. Or even a
> > module parameter is better.
> > 
> > Thanks
> 
> 
> Sorry, I misread the patch. But even module parameter is something not
> flexible enough. We should deal with them transparently.
> 

Okay, I'll try to understand how we can automatically adapt the RX
buffer size. Since the flow is stream based, the receiver doesn't know the
original packet size.

Maybe I can reuse the EWMA approach to understand if the buffers are
entirely filled or not.
In that case I can increase (e.g. double) or decrease the size.

I'll try to do it!

Thanks,
Stefano
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 7/8] vsock/virtio: increase RX buffer size to 64 KiB
  2019-05-14  3:38       ` Jason Wang
  2019-05-14 16:20         ` Stefano Garzarella
@ 2019-05-14 16:20         ` Stefano Garzarella
  2019-05-15  2:50           ` Jason Wang
  2019-05-15  2:50           ` Jason Wang
  1 sibling, 2 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-14 16:20 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Stefan Hajnoczi

On Tue, May 14, 2019 at 11:38:05AM +0800, Jason Wang wrote:
> 
> On 2019/5/14 上午1:51, Stefano Garzarella wrote:
> > On Mon, May 13, 2019 at 06:01:52PM +0800, Jason Wang wrote:
> > > On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> > > > In order to increase host -> guest throughput with large packets,
> > > > we can use 64 KiB RX buffers.
> > > > 
> > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > > ---
> > > >    include/linux/virtio_vsock.h | 2 +-
> > > >    1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > > > index 84b72026d327..5a9d25be72df 100644
> > > > --- a/include/linux/virtio_vsock.h
> > > > +++ b/include/linux/virtio_vsock.h
> > > > @@ -10,7 +10,7 @@
> > > >    #define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE	128
> > > >    #define VIRTIO_VSOCK_DEFAULT_BUF_SIZE		(1024 * 256)
> > > >    #define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE	(1024 * 256)
> > > > -#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 4)
> > > > +#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 64)
> > > >    #define VIRTIO_VSOCK_MAX_BUF_SIZE		0xFFFFFFFFUL
> > > >    #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE		(1024 * 64)
> > > 
> > > We probably don't want such high order allocation. It's better to switch to
> > > use order 0 pages in this case. See add_recvbuf_big() for virtio-net. If we
> > > get datapath unified, we will get more stuffs set.
> > IIUC, you are suggesting to allocate only pages and put them in a
> > scatterlist, then add them to the virtqueue.
> > 
> > Is it correct?
> 
> 
> Yes since you are using:
> 
>                 pkt->buf = kmalloc(buf_len, GFP_KERNEL);
>                 if (!pkt->buf) {
>                         virtio_transport_free_pkt(pkt);
>                         break;
>                 }
> 
> This is likely to fail when the memory is fragmented which is kind of
> fragile.
> 
> 

Thanks for pointing that out.

> > 
> > The issue that I have here, is that the virtio-vsock guest driver, see
> > virtio_vsock_rx_fill(), allocates a struct virtio_vsock_pkt that
> > contains the room for the header, then allocates the buffer for the payload.
> > At this point it fills the scatterlist with the &virtio_vsock_pkt.hdr and the
> > buffer for the payload.
> 
> 
> This part should be fine since what is needed is just adding more pages to
> sg[] and call virtuqeueu_add_sg().
> 
> 

Yes, I agree.

> > 
> > Changing this will require several modifications, and if we get datapath
> > unified, I'm not sure it's worth it.
> > Of course, if we leave the datapaths separated, I'd like to do that later.
> > 
> > What do you think?
> 
> 
> For the driver it self, it should not be hard. But I think you mean the
> issue of e.g virtio_vsock_pkt itself which doesn't support sg. For short
> time, maybe we can use kvec instead.

I'll try to use kvec in the virtio_vsock_pkt.

Since this struct is shared also with the host driver (vhost-vsock),
I hope the changes could be limited, otherwise we can remove the last 2
patches of the series for now, leaving the RX buffer size to 4KB.

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 7/8] vsock/virtio: increase RX buffer size to 64 KiB
  2019-05-14  3:38       ` Jason Wang
@ 2019-05-14 16:20         ` Stefano Garzarella
  2019-05-14 16:20         ` Stefano Garzarella
  1 sibling, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-14 16:20 UTC (permalink / raw)
  To: Jason Wang
  Cc: kvm, Michael S. Tsirkin, netdev, linux-kernel, virtualization,
	Stefan Hajnoczi, David S. Miller

On Tue, May 14, 2019 at 11:38:05AM +0800, Jason Wang wrote:
> 
> On 2019/5/14 上午1:51, Stefano Garzarella wrote:
> > On Mon, May 13, 2019 at 06:01:52PM +0800, Jason Wang wrote:
> > > On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> > > > In order to increase host -> guest throughput with large packets,
> > > > we can use 64 KiB RX buffers.
> > > > 
> > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > > ---
> > > >    include/linux/virtio_vsock.h | 2 +-
> > > >    1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > > > index 84b72026d327..5a9d25be72df 100644
> > > > --- a/include/linux/virtio_vsock.h
> > > > +++ b/include/linux/virtio_vsock.h
> > > > @@ -10,7 +10,7 @@
> > > >    #define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE	128
> > > >    #define VIRTIO_VSOCK_DEFAULT_BUF_SIZE		(1024 * 256)
> > > >    #define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE	(1024 * 256)
> > > > -#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 4)
> > > > +#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 64)
> > > >    #define VIRTIO_VSOCK_MAX_BUF_SIZE		0xFFFFFFFFUL
> > > >    #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE		(1024 * 64)
> > > 
> > > We probably don't want such high order allocation. It's better to switch to
> > > use order 0 pages in this case. See add_recvbuf_big() for virtio-net. If we
> > > get datapath unified, we will get more stuffs set.
> > IIUC, you are suggesting to allocate only pages and put them in a
> > scatterlist, then add them to the virtqueue.
> > 
> > Is it correct?
> 
> 
> Yes since you are using:
> 
>                 pkt->buf = kmalloc(buf_len, GFP_KERNEL);
>                 if (!pkt->buf) {
>                         virtio_transport_free_pkt(pkt);
>                         break;
>                 }
> 
> This is likely to fail when the memory is fragmented which is kind of
> fragile.
> 
> 

Thanks for pointing that out.

> > 
> > The issue that I have here, is that the virtio-vsock guest driver, see
> > virtio_vsock_rx_fill(), allocates a struct virtio_vsock_pkt that
> > contains the room for the header, then allocates the buffer for the payload.
> > At this point it fills the scatterlist with the &virtio_vsock_pkt.hdr and the
> > buffer for the payload.
> 
> 
> This part should be fine since what is needed is just adding more pages to
> sg[] and call virtuqeueu_add_sg().
> 
> 

Yes, I agree.

> > 
> > Changing this will require several modifications, and if we get datapath
> > unified, I'm not sure it's worth it.
> > Of course, if we leave the datapaths separated, I'd like to do that later.
> > 
> > What do you think?
> 
> 
> For the driver it self, it should not be hard. But I think you mean the
> issue of e.g virtio_vsock_pkt itself which doesn't support sg. For short
> time, maybe we can use kvec instead.

I'll try to use kvec in the virtio_vsock_pkt.

Since this struct is shared also with the host driver (vhost-vsock),
I hope the changes could be limited, otherwise we can remove the last 2
patches of the series for now, leaving the RX buffer size to 4KB.

Thanks,
Stefano
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket
  2019-05-14  3:25       ` Jason Wang
                           ` (2 preceding siblings ...)
  2019-05-14 16:35         ` Stefano Garzarella
@ 2019-05-14 16:35         ` Stefano Garzarella
  2019-05-15  2:48             ` Jason Wang
  3 siblings, 1 reply; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-14 16:35 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Stefan Hajnoczi

On Tue, May 14, 2019 at 11:25:34AM +0800, Jason Wang wrote:
> 
> On 2019/5/14 上午1:23, Stefano Garzarella wrote:
> > On Mon, May 13, 2019 at 05:58:53PM +0800, Jason Wang wrote:
> > > On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > and pushed to the guest using the vring, are directly queued in
> > > > a per-socket list avoiding to copy it.
> > > > These buffers are preallocated by the guest with a fixed
> > > > size (4 KB).
> > > > 
> > > > The maximum amount of memory used by each socket should be
> > > > controlled by the credit mechanism.
> > > > The default credit available per-socket is 256 KB, but if we use
> > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > to avoid starvation of her sockets.
> > > > 
> > > > This patch solves this issue copying the payload in a new buffer.
> > > > Then it is queued in the per-socket list, and the 4KB buffer used
> > > > by the host is freed.
> > > > 
> > > > In this way, the memory used by each socket respects the credit
> > > > available, and we still avoid starvation, paying the cost of an
> > > > extra memory copy. When the buffer is completely full we do a
> > > > "zero-copy", moving the buffer directly in the per-socket list.
> > > 
> > > I wonder in the long run we should use generic socket accouting mechanism
> > > provided by kernel (e.g socket, skb, sndbuf, recvbug, truesize) instead of
> > > vsock specific thing to avoid duplicating efforts.
> > I agree, the idea is to switch to sk_buff but this should require an huge
> > change. If we will use the virtio-net datapath, it will become simpler.
> 
> 
> Yes, unix domain socket is one example that uses general skb and socket
> structure. And we probably need some kind of socket pair on host. Using
> socket can also simplify the unification with vhost-net which depends on the
> socket proto_ops to work. I admit it's a huge change probably, we can do it
> gradually.
> 

Yes, I also prefer to do this change gradually :)

> 
> > > 
> > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > > ---
> > > >    drivers/vhost/vsock.c                   |  2 +
> > > >    include/linux/virtio_vsock.h            |  8 +++
> > > >    net/vmw_vsock/virtio_transport.c        |  1 +
> > > >    net/vmw_vsock/virtio_transport_common.c | 95 ++++++++++++++++++-------
> > > >    4 files changed, 81 insertions(+), 25 deletions(-)
> > > > 
> > > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > > > index bb5fc0e9fbc2..7964e2daee09 100644
> > > > --- a/drivers/vhost/vsock.c
> > > > +++ b/drivers/vhost/vsock.c
> > > > @@ -320,6 +320,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
> > > >    		return NULL;
> > > >    	}
> > > > +	pkt->buf_len = pkt->len;
> > > > +
> > > >    	nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter);
> > > >    	if (nbytes != pkt->len) {
> > > >    		vq_err(vq, "Expected %u byte payload, got %zu bytes\n",
> > > > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > > > index e223e2632edd..345f04ee9193 100644
> > > > --- a/include/linux/virtio_vsock.h
> > > > +++ b/include/linux/virtio_vsock.h
> > > > @@ -54,9 +54,17 @@ struct virtio_vsock_pkt {
> > > >    	void *buf;
> > > >    	u32 len;
> > > >    	u32 off;
> > > > +	u32 buf_len;
> > > >    	bool reply;
> > > >    };
> > > > +struct virtio_vsock_buf {
> > > > +	struct list_head list;
> > > > +	void *addr;
> > > > +	u32 len;
> > > > +	u32 off;
> > > > +};
> > > > +
> > > >    struct virtio_vsock_pkt_info {
> > > >    	u32 remote_cid, remote_port;
> > > >    	struct vsock_sock *vsk;
> > > > diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> > > > index 15eb5d3d4750..af1d2ce12f54 100644
> > > > --- a/net/vmw_vsock/virtio_transport.c
> > > > +++ b/net/vmw_vsock/virtio_transport.c
> > > > @@ -280,6 +280,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
> > > >    			break;
> > > >    		}
> > > > +		pkt->buf_len = buf_len;
> > > >    		pkt->len = buf_len;
> > > >    		sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
> > > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > > > index 602715fc9a75..0248d6808755 100644
> > > > --- a/net/vmw_vsock/virtio_transport_common.c
> > > > +++ b/net/vmw_vsock/virtio_transport_common.c
> > > > @@ -65,6 +65,9 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
> > > >    		pkt->buf = kmalloc(len, GFP_KERNEL);
> > > >    		if (!pkt->buf)
> > > >    			goto out_pkt;
> > > > +
> > > > +		pkt->buf_len = len;
> > > > +
> > > >    		err = memcpy_from_msg(pkt->buf, info->msg, len);
> > > >    		if (err)
> > > >    			goto out;
> > > > @@ -86,6 +89,46 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
> > > >    	return NULL;
> > > >    }
> > > > +static struct virtio_vsock_buf *
> > > > +virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
> > > > +{
> > > > +	struct virtio_vsock_buf *buf;
> > > > +
> > > > +	if (pkt->len == 0)
> > > > +		return NULL;
> > > > +
> > > > +	buf = kzalloc(sizeof(*buf), GFP_KERNEL);
> > > > +	if (!buf)
> > > > +		return NULL;
> > > > +
> > > > +	/* If the buffer in the virtio_vsock_pkt is full, we can move it to
> > > > +	 * the new virtio_vsock_buf avoiding the copy, because we are sure that
> > > > +	 * we are not use more memory than that counted by the credit mechanism.
> > > > +	 */
> > > > +	if (zero_copy && pkt->len == pkt->buf_len) {
> > > > +		buf->addr = pkt->buf;
> > > > +		pkt->buf = NULL;
> > > > +	} else {
> > > 
> > > Is the copy still needed if we're just few bytes less? We meet similar issue
> > > for virito-net, and virtio-net solve this by always copy first 128bytes for
> > > big packets.
> > > 
> > > See receive_big()
> > I'm seeing, It is more sophisticated.
> > IIUC, virtio-net allocates a sk_buff with 128 bytes of buffer, then copies the
> > first 128 bytes, then adds the buffer used to receive the packet as a frag to
> > the skb.
> 
> 
> Yes and the point is if the packet is smaller than 128 bytes the pages will
> be recycled.
> 
> 

So it's avoid the overhead of allocation of a large buffer. I got it.

Just a curiosity, why the threshold is 128 bytes?

> > 
> > Do you suggest to implement something similar, or for now we can use my
> > approach and if we will merge the datapath we can reuse the virtio-net
> > approach?
> 
> 
> I think we need a better threshold. If I understand the patch correctly, we
> will do copy unless the packet is 64K when guest is doing receiving. 1 byte
> packet is indeed a problem, but we need to solve it without losing too much
> performance.

It is correct. I'll try to figure out a better threshold and the usage of
order 0 page.

Thanks again for your advices,
Stefano

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket
  2019-05-14  3:25       ` Jason Wang
  2019-05-14  3:40         ` Jason Wang
  2019-05-14  3:40         ` Jason Wang
@ 2019-05-14 16:35         ` Stefano Garzarella
  2019-05-14 16:35         ` Stefano Garzarella
  3 siblings, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-14 16:35 UTC (permalink / raw)
  To: Jason Wang
  Cc: kvm, Michael S. Tsirkin, netdev, linux-kernel, virtualization,
	Stefan Hajnoczi, David S. Miller

On Tue, May 14, 2019 at 11:25:34AM +0800, Jason Wang wrote:
> 
> On 2019/5/14 上午1:23, Stefano Garzarella wrote:
> > On Mon, May 13, 2019 at 05:58:53PM +0800, Jason Wang wrote:
> > > On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > and pushed to the guest using the vring, are directly queued in
> > > > a per-socket list avoiding to copy it.
> > > > These buffers are preallocated by the guest with a fixed
> > > > size (4 KB).
> > > > 
> > > > The maximum amount of memory used by each socket should be
> > > > controlled by the credit mechanism.
> > > > The default credit available per-socket is 256 KB, but if we use
> > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > to avoid starvation of her sockets.
> > > > 
> > > > This patch solves this issue copying the payload in a new buffer.
> > > > Then it is queued in the per-socket list, and the 4KB buffer used
> > > > by the host is freed.
> > > > 
> > > > In this way, the memory used by each socket respects the credit
> > > > available, and we still avoid starvation, paying the cost of an
> > > > extra memory copy. When the buffer is completely full we do a
> > > > "zero-copy", moving the buffer directly in the per-socket list.
> > > 
> > > I wonder in the long run we should use generic socket accouting mechanism
> > > provided by kernel (e.g socket, skb, sndbuf, recvbug, truesize) instead of
> > > vsock specific thing to avoid duplicating efforts.
> > I agree, the idea is to switch to sk_buff but this should require an huge
> > change. If we will use the virtio-net datapath, it will become simpler.
> 
> 
> Yes, unix domain socket is one example that uses general skb and socket
> structure. And we probably need some kind of socket pair on host. Using
> socket can also simplify the unification with vhost-net which depends on the
> socket proto_ops to work. I admit it's a huge change probably, we can do it
> gradually.
> 

Yes, I also prefer to do this change gradually :)

> 
> > > 
> > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > > ---
> > > >    drivers/vhost/vsock.c                   |  2 +
> > > >    include/linux/virtio_vsock.h            |  8 +++
> > > >    net/vmw_vsock/virtio_transport.c        |  1 +
> > > >    net/vmw_vsock/virtio_transport_common.c | 95 ++++++++++++++++++-------
> > > >    4 files changed, 81 insertions(+), 25 deletions(-)
> > > > 
> > > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > > > index bb5fc0e9fbc2..7964e2daee09 100644
> > > > --- a/drivers/vhost/vsock.c
> > > > +++ b/drivers/vhost/vsock.c
> > > > @@ -320,6 +320,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
> > > >    		return NULL;
> > > >    	}
> > > > +	pkt->buf_len = pkt->len;
> > > > +
> > > >    	nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter);
> > > >    	if (nbytes != pkt->len) {
> > > >    		vq_err(vq, "Expected %u byte payload, got %zu bytes\n",
> > > > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > > > index e223e2632edd..345f04ee9193 100644
> > > > --- a/include/linux/virtio_vsock.h
> > > > +++ b/include/linux/virtio_vsock.h
> > > > @@ -54,9 +54,17 @@ struct virtio_vsock_pkt {
> > > >    	void *buf;
> > > >    	u32 len;
> > > >    	u32 off;
> > > > +	u32 buf_len;
> > > >    	bool reply;
> > > >    };
> > > > +struct virtio_vsock_buf {
> > > > +	struct list_head list;
> > > > +	void *addr;
> > > > +	u32 len;
> > > > +	u32 off;
> > > > +};
> > > > +
> > > >    struct virtio_vsock_pkt_info {
> > > >    	u32 remote_cid, remote_port;
> > > >    	struct vsock_sock *vsk;
> > > > diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> > > > index 15eb5d3d4750..af1d2ce12f54 100644
> > > > --- a/net/vmw_vsock/virtio_transport.c
> > > > +++ b/net/vmw_vsock/virtio_transport.c
> > > > @@ -280,6 +280,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
> > > >    			break;
> > > >    		}
> > > > +		pkt->buf_len = buf_len;
> > > >    		pkt->len = buf_len;
> > > >    		sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
> > > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > > > index 602715fc9a75..0248d6808755 100644
> > > > --- a/net/vmw_vsock/virtio_transport_common.c
> > > > +++ b/net/vmw_vsock/virtio_transport_common.c
> > > > @@ -65,6 +65,9 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
> > > >    		pkt->buf = kmalloc(len, GFP_KERNEL);
> > > >    		if (!pkt->buf)
> > > >    			goto out_pkt;
> > > > +
> > > > +		pkt->buf_len = len;
> > > > +
> > > >    		err = memcpy_from_msg(pkt->buf, info->msg, len);
> > > >    		if (err)
> > > >    			goto out;
> > > > @@ -86,6 +89,46 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
> > > >    	return NULL;
> > > >    }
> > > > +static struct virtio_vsock_buf *
> > > > +virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
> > > > +{
> > > > +	struct virtio_vsock_buf *buf;
> > > > +
> > > > +	if (pkt->len == 0)
> > > > +		return NULL;
> > > > +
> > > > +	buf = kzalloc(sizeof(*buf), GFP_KERNEL);
> > > > +	if (!buf)
> > > > +		return NULL;
> > > > +
> > > > +	/* If the buffer in the virtio_vsock_pkt is full, we can move it to
> > > > +	 * the new virtio_vsock_buf avoiding the copy, because we are sure that
> > > > +	 * we are not use more memory than that counted by the credit mechanism.
> > > > +	 */
> > > > +	if (zero_copy && pkt->len == pkt->buf_len) {
> > > > +		buf->addr = pkt->buf;
> > > > +		pkt->buf = NULL;
> > > > +	} else {
> > > 
> > > Is the copy still needed if we're just few bytes less? We meet similar issue
> > > for virito-net, and virtio-net solve this by always copy first 128bytes for
> > > big packets.
> > > 
> > > See receive_big()
> > I'm seeing, It is more sophisticated.
> > IIUC, virtio-net allocates a sk_buff with 128 bytes of buffer, then copies the
> > first 128 bytes, then adds the buffer used to receive the packet as a frag to
> > the skb.
> 
> 
> Yes and the point is if the packet is smaller than 128 bytes the pages will
> be recycled.
> 
> 

So it's avoid the overhead of allocation of a large buffer. I got it.

Just a curiosity, why the threshold is 128 bytes?

> > 
> > Do you suggest to implement something similar, or for now we can use my
> > approach and if we will merge the datapath we can reuse the virtio-net
> > approach?
> 
> 
> I think we need a better threshold. If I understand the patch correctly, we
> will do copy unless the packet is 64K when guest is doing receiving. 1 byte
> packet is indeed a problem, but we need to solve it without losing too much
> performance.

It is correct. I'll try to figure out a better threshold and the usage of
order 0 page.

Thanks again for your advices,
Stefano
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket
  2019-05-14 16:35         ` Stefano Garzarella
@ 2019-05-15  2:48             ` Jason Wang
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Wang @ 2019-05-15  2:48 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Stefan Hajnoczi


On 2019/5/15 上午12:35, Stefano Garzarella wrote:
> On Tue, May 14, 2019 at 11:25:34AM +0800, Jason Wang wrote:
>> On 2019/5/14 上午1:23, Stefano Garzarella wrote:
>>> On Mon, May 13, 2019 at 05:58:53PM +0800, Jason Wang wrote:
>>>> On 2019/5/10 下午8:58, Stefano Garzarella wrote:
>>>>> Since virtio-vsock was introduced, the buffers filled by the host
>>>>> and pushed to the guest using the vring, are directly queued in
>>>>> a per-socket list avoiding to copy it.
>>>>> These buffers are preallocated by the guest with a fixed
>>>>> size (4 KB).
>>>>>
>>>>> The maximum amount of memory used by each socket should be
>>>>> controlled by the credit mechanism.
>>>>> The default credit available per-socket is 256 KB, but if we use
>>>>> only 1 byte per packet, the guest can queue up to 262144 of 4 KB
>>>>> buffers, using up to 1 GB of memory per-socket. In addition, the
>>>>> guest will continue to fill the vring with new 4 KB free buffers
>>>>> to avoid starvation of her sockets.
>>>>>
>>>>> This patch solves this issue copying the payload in a new buffer.
>>>>> Then it is queued in the per-socket list, and the 4KB buffer used
>>>>> by the host is freed.
>>>>>
>>>>> In this way, the memory used by each socket respects the credit
>>>>> available, and we still avoid starvation, paying the cost of an
>>>>> extra memory copy. When the buffer is completely full we do a
>>>>> "zero-copy", moving the buffer directly in the per-socket list.
>>>> I wonder in the long run we should use generic socket accouting mechanism
>>>> provided by kernel (e.g socket, skb, sndbuf, recvbug, truesize) instead of
>>>> vsock specific thing to avoid duplicating efforts.
>>> I agree, the idea is to switch to sk_buff but this should require an huge
>>> change. If we will use the virtio-net datapath, it will become simpler.
>>
>> Yes, unix domain socket is one example that uses general skb and socket
>> structure. And we probably need some kind of socket pair on host. Using
>> socket can also simplify the unification with vhost-net which depends on the
>> socket proto_ops to work. I admit it's a huge change probably, we can do it
>> gradually.
>>
> Yes, I also prefer to do this change gradually :)
>
>>>>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>>>>> ---
>>>>>     drivers/vhost/vsock.c                   |  2 +
>>>>>     include/linux/virtio_vsock.h            |  8 +++
>>>>>     net/vmw_vsock/virtio_transport.c        |  1 +
>>>>>     net/vmw_vsock/virtio_transport_common.c | 95 ++++++++++++++++++-------
>>>>>     4 files changed, 81 insertions(+), 25 deletions(-)
>>>>>
>>>>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>>>>> index bb5fc0e9fbc2..7964e2daee09 100644
>>>>> --- a/drivers/vhost/vsock.c
>>>>> +++ b/drivers/vhost/vsock.c
>>>>> @@ -320,6 +320,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
>>>>>     		return NULL;
>>>>>     	}
>>>>> +	pkt->buf_len = pkt->len;
>>>>> +
>>>>>     	nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter);
>>>>>     	if (nbytes != pkt->len) {
>>>>>     		vq_err(vq, "Expected %u byte payload, got %zu bytes\n",
>>>>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>>>>> index e223e2632edd..345f04ee9193 100644
>>>>> --- a/include/linux/virtio_vsock.h
>>>>> +++ b/include/linux/virtio_vsock.h
>>>>> @@ -54,9 +54,17 @@ struct virtio_vsock_pkt {
>>>>>     	void *buf;
>>>>>     	u32 len;
>>>>>     	u32 off;
>>>>> +	u32 buf_len;
>>>>>     	bool reply;
>>>>>     };
>>>>> +struct virtio_vsock_buf {
>>>>> +	struct list_head list;
>>>>> +	void *addr;
>>>>> +	u32 len;
>>>>> +	u32 off;
>>>>> +};
>>>>> +
>>>>>     struct virtio_vsock_pkt_info {
>>>>>     	u32 remote_cid, remote_port;
>>>>>     	struct vsock_sock *vsk;
>>>>> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>>>>> index 15eb5d3d4750..af1d2ce12f54 100644
>>>>> --- a/net/vmw_vsock/virtio_transport.c
>>>>> +++ b/net/vmw_vsock/virtio_transport.c
>>>>> @@ -280,6 +280,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
>>>>>     			break;
>>>>>     		}
>>>>> +		pkt->buf_len = buf_len;
>>>>>     		pkt->len = buf_len;
>>>>>     		sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
>>>>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>>>>> index 602715fc9a75..0248d6808755 100644
>>>>> --- a/net/vmw_vsock/virtio_transport_common.c
>>>>> +++ b/net/vmw_vsock/virtio_transport_common.c
>>>>> @@ -65,6 +65,9 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
>>>>>     		pkt->buf = kmalloc(len, GFP_KERNEL);
>>>>>     		if (!pkt->buf)
>>>>>     			goto out_pkt;
>>>>> +
>>>>> +		pkt->buf_len = len;
>>>>> +
>>>>>     		err = memcpy_from_msg(pkt->buf, info->msg, len);
>>>>>     		if (err)
>>>>>     			goto out;
>>>>> @@ -86,6 +89,46 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
>>>>>     	return NULL;
>>>>>     }
>>>>> +static struct virtio_vsock_buf *
>>>>> +virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
>>>>> +{
>>>>> +	struct virtio_vsock_buf *buf;
>>>>> +
>>>>> +	if (pkt->len == 0)
>>>>> +		return NULL;
>>>>> +
>>>>> +	buf = kzalloc(sizeof(*buf), GFP_KERNEL);
>>>>> +	if (!buf)
>>>>> +		return NULL;
>>>>> +
>>>>> +	/* If the buffer in the virtio_vsock_pkt is full, we can move it to
>>>>> +	 * the new virtio_vsock_buf avoiding the copy, because we are sure that
>>>>> +	 * we are not use more memory than that counted by the credit mechanism.
>>>>> +	 */
>>>>> +	if (zero_copy && pkt->len == pkt->buf_len) {
>>>>> +		buf->addr = pkt->buf;
>>>>> +		pkt->buf = NULL;
>>>>> +	} else {
>>>> Is the copy still needed if we're just few bytes less? We meet similar issue
>>>> for virito-net, and virtio-net solve this by always copy first 128bytes for
>>>> big packets.
>>>>
>>>> See receive_big()
>>> I'm seeing, It is more sophisticated.
>>> IIUC, virtio-net allocates a sk_buff with 128 bytes of buffer, then copies the
>>> first 128 bytes, then adds the buffer used to receive the packet as a frag to
>>> the skb.
>>
>> Yes and the point is if the packet is smaller than 128 bytes the pages will
>> be recycled.
>>
>>
> So it's avoid the overhead of allocation of a large buffer. I got it.
>
> Just a curiosity, why the threshold is 128 bytes?


 From its name (GOOD_COPY_LEN), I think it just a value that won't lose 
much performance, e.g the size two cachelines.

Thanks


>
>>> Do you suggest to implement something similar, or for now we can use my
>>> approach and if we will merge the datapath we can reuse the virtio-net
>>> approach?
>>
>> I think we need a better threshold. If I understand the patch correctly, we
>> will do copy unless the packet is 64K when guest is doing receiving. 1 byte
>> packet is indeed a problem, but we need to solve it without losing too much
>> performance.
> It is correct. I'll try to figure out a better threshold and the usage of
> order 0 page.
>
> Thanks again for your advices,
> Stefano

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket
@ 2019-05-15  2:48             ` Jason Wang
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Wang @ 2019-05-15  2:48 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: kvm, Michael S. Tsirkin, netdev, linux-kernel, virtualization,
	Stefan Hajnoczi, David S. Miller


On 2019/5/15 上午12:35, Stefano Garzarella wrote:
> On Tue, May 14, 2019 at 11:25:34AM +0800, Jason Wang wrote:
>> On 2019/5/14 上午1:23, Stefano Garzarella wrote:
>>> On Mon, May 13, 2019 at 05:58:53PM +0800, Jason Wang wrote:
>>>> On 2019/5/10 下午8:58, Stefano Garzarella wrote:
>>>>> Since virtio-vsock was introduced, the buffers filled by the host
>>>>> and pushed to the guest using the vring, are directly queued in
>>>>> a per-socket list avoiding to copy it.
>>>>> These buffers are preallocated by the guest with a fixed
>>>>> size (4 KB).
>>>>>
>>>>> The maximum amount of memory used by each socket should be
>>>>> controlled by the credit mechanism.
>>>>> The default credit available per-socket is 256 KB, but if we use
>>>>> only 1 byte per packet, the guest can queue up to 262144 of 4 KB
>>>>> buffers, using up to 1 GB of memory per-socket. In addition, the
>>>>> guest will continue to fill the vring with new 4 KB free buffers
>>>>> to avoid starvation of her sockets.
>>>>>
>>>>> This patch solves this issue copying the payload in a new buffer.
>>>>> Then it is queued in the per-socket list, and the 4KB buffer used
>>>>> by the host is freed.
>>>>>
>>>>> In this way, the memory used by each socket respects the credit
>>>>> available, and we still avoid starvation, paying the cost of an
>>>>> extra memory copy. When the buffer is completely full we do a
>>>>> "zero-copy", moving the buffer directly in the per-socket list.
>>>> I wonder in the long run we should use generic socket accouting mechanism
>>>> provided by kernel (e.g socket, skb, sndbuf, recvbug, truesize) instead of
>>>> vsock specific thing to avoid duplicating efforts.
>>> I agree, the idea is to switch to sk_buff but this should require an huge
>>> change. If we will use the virtio-net datapath, it will become simpler.
>>
>> Yes, unix domain socket is one example that uses general skb and socket
>> structure. And we probably need some kind of socket pair on host. Using
>> socket can also simplify the unification with vhost-net which depends on the
>> socket proto_ops to work. I admit it's a huge change probably, we can do it
>> gradually.
>>
> Yes, I also prefer to do this change gradually :)
>
>>>>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>>>>> ---
>>>>>     drivers/vhost/vsock.c                   |  2 +
>>>>>     include/linux/virtio_vsock.h            |  8 +++
>>>>>     net/vmw_vsock/virtio_transport.c        |  1 +
>>>>>     net/vmw_vsock/virtio_transport_common.c | 95 ++++++++++++++++++-------
>>>>>     4 files changed, 81 insertions(+), 25 deletions(-)
>>>>>
>>>>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>>>>> index bb5fc0e9fbc2..7964e2daee09 100644
>>>>> --- a/drivers/vhost/vsock.c
>>>>> +++ b/drivers/vhost/vsock.c
>>>>> @@ -320,6 +320,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
>>>>>     		return NULL;
>>>>>     	}
>>>>> +	pkt->buf_len = pkt->len;
>>>>> +
>>>>>     	nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter);
>>>>>     	if (nbytes != pkt->len) {
>>>>>     		vq_err(vq, "Expected %u byte payload, got %zu bytes\n",
>>>>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>>>>> index e223e2632edd..345f04ee9193 100644
>>>>> --- a/include/linux/virtio_vsock.h
>>>>> +++ b/include/linux/virtio_vsock.h
>>>>> @@ -54,9 +54,17 @@ struct virtio_vsock_pkt {
>>>>>     	void *buf;
>>>>>     	u32 len;
>>>>>     	u32 off;
>>>>> +	u32 buf_len;
>>>>>     	bool reply;
>>>>>     };
>>>>> +struct virtio_vsock_buf {
>>>>> +	struct list_head list;
>>>>> +	void *addr;
>>>>> +	u32 len;
>>>>> +	u32 off;
>>>>> +};
>>>>> +
>>>>>     struct virtio_vsock_pkt_info {
>>>>>     	u32 remote_cid, remote_port;
>>>>>     	struct vsock_sock *vsk;
>>>>> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>>>>> index 15eb5d3d4750..af1d2ce12f54 100644
>>>>> --- a/net/vmw_vsock/virtio_transport.c
>>>>> +++ b/net/vmw_vsock/virtio_transport.c
>>>>> @@ -280,6 +280,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
>>>>>     			break;
>>>>>     		}
>>>>> +		pkt->buf_len = buf_len;
>>>>>     		pkt->len = buf_len;
>>>>>     		sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
>>>>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>>>>> index 602715fc9a75..0248d6808755 100644
>>>>> --- a/net/vmw_vsock/virtio_transport_common.c
>>>>> +++ b/net/vmw_vsock/virtio_transport_common.c
>>>>> @@ -65,6 +65,9 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
>>>>>     		pkt->buf = kmalloc(len, GFP_KERNEL);
>>>>>     		if (!pkt->buf)
>>>>>     			goto out_pkt;
>>>>> +
>>>>> +		pkt->buf_len = len;
>>>>> +
>>>>>     		err = memcpy_from_msg(pkt->buf, info->msg, len);
>>>>>     		if (err)
>>>>>     			goto out;
>>>>> @@ -86,6 +89,46 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
>>>>>     	return NULL;
>>>>>     }
>>>>> +static struct virtio_vsock_buf *
>>>>> +virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
>>>>> +{
>>>>> +	struct virtio_vsock_buf *buf;
>>>>> +
>>>>> +	if (pkt->len == 0)
>>>>> +		return NULL;
>>>>> +
>>>>> +	buf = kzalloc(sizeof(*buf), GFP_KERNEL);
>>>>> +	if (!buf)
>>>>> +		return NULL;
>>>>> +
>>>>> +	/* If the buffer in the virtio_vsock_pkt is full, we can move it to
>>>>> +	 * the new virtio_vsock_buf avoiding the copy, because we are sure that
>>>>> +	 * we are not use more memory than that counted by the credit mechanism.
>>>>> +	 */
>>>>> +	if (zero_copy && pkt->len == pkt->buf_len) {
>>>>> +		buf->addr = pkt->buf;
>>>>> +		pkt->buf = NULL;
>>>>> +	} else {
>>>> Is the copy still needed if we're just few bytes less? We meet similar issue
>>>> for virito-net, and virtio-net solve this by always copy first 128bytes for
>>>> big packets.
>>>>
>>>> See receive_big()
>>> I'm seeing, It is more sophisticated.
>>> IIUC, virtio-net allocates a sk_buff with 128 bytes of buffer, then copies the
>>> first 128 bytes, then adds the buffer used to receive the packet as a frag to
>>> the skb.
>>
>> Yes and the point is if the packet is smaller than 128 bytes the pages will
>> be recycled.
>>
>>
> So it's avoid the overhead of allocation of a large buffer. I got it.
>
> Just a curiosity, why the threshold is 128 bytes?


 From its name (GOOD_COPY_LEN), I think it just a value that won't lose 
much performance, e.g the size two cachelines.

Thanks


>
>>> Do you suggest to implement something similar, or for now we can use my
>>> approach and if we will merge the datapath we can reuse the virtio-net
>>> approach?
>>
>> I think we need a better threshold. If I understand the patch correctly, we
>> will do copy unless the packet is 64K when guest is doing receiving. 1 byte
>> packet is indeed a problem, but we need to solve it without losing too much
>> performance.
> It is correct. I'll try to figure out a better threshold and the usage of
> order 0 page.
>
> Thanks again for your advices,
> Stefano
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 7/8] vsock/virtio: increase RX buffer size to 64 KiB
  2019-05-14 16:20         ` Stefano Garzarella
@ 2019-05-15  2:50           ` Jason Wang
  2019-05-15  8:22             ` Stefano Garzarella
  2019-05-15  8:22             ` Stefano Garzarella
  2019-05-15  2:50           ` Jason Wang
  1 sibling, 2 replies; 75+ messages in thread
From: Jason Wang @ 2019-05-15  2:50 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Stefan Hajnoczi


On 2019/5/15 上午12:20, Stefano Garzarella wrote:
> On Tue, May 14, 2019 at 11:38:05AM +0800, Jason Wang wrote:
>> On 2019/5/14 上午1:51, Stefano Garzarella wrote:
>>> On Mon, May 13, 2019 at 06:01:52PM +0800, Jason Wang wrote:
>>>> On 2019/5/10 下午8:58, Stefano Garzarella wrote:
>>>>> In order to increase host -> guest throughput with large packets,
>>>>> we can use 64 KiB RX buffers.
>>>>>
>>>>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>>>>> ---
>>>>>     include/linux/virtio_vsock.h | 2 +-
>>>>>     1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>>>>> index 84b72026d327..5a9d25be72df 100644
>>>>> --- a/include/linux/virtio_vsock.h
>>>>> +++ b/include/linux/virtio_vsock.h
>>>>> @@ -10,7 +10,7 @@
>>>>>     #define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE	128
>>>>>     #define VIRTIO_VSOCK_DEFAULT_BUF_SIZE		(1024 * 256)
>>>>>     #define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE	(1024 * 256)
>>>>> -#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 4)
>>>>> +#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 64)
>>>>>     #define VIRTIO_VSOCK_MAX_BUF_SIZE		0xFFFFFFFFUL
>>>>>     #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE		(1024 * 64)
>>>> We probably don't want such high order allocation. It's better to switch to
>>>> use order 0 pages in this case. See add_recvbuf_big() for virtio-net. If we
>>>> get datapath unified, we will get more stuffs set.
>>> IIUC, you are suggesting to allocate only pages and put them in a
>>> scatterlist, then add them to the virtqueue.
>>>
>>> Is it correct?
>>
>> Yes since you are using:
>>
>>                  pkt->buf = kmalloc(buf_len, GFP_KERNEL);
>>                  if (!pkt->buf) {
>>                          virtio_transport_free_pkt(pkt);
>>                          break;
>>                  }
>>
>> This is likely to fail when the memory is fragmented which is kind of
>> fragile.
>>
>>
> Thanks for pointing that out.
>
>>> The issue that I have here, is that the virtio-vsock guest driver, see
>>> virtio_vsock_rx_fill(), allocates a struct virtio_vsock_pkt that
>>> contains the room for the header, then allocates the buffer for the payload.
>>> At this point it fills the scatterlist with the &virtio_vsock_pkt.hdr and the
>>> buffer for the payload.
>>
>> This part should be fine since what is needed is just adding more pages to
>> sg[] and call virtuqeueu_add_sg().
>>
>>
> Yes, I agree.
>
>>> Changing this will require several modifications, and if we get datapath
>>> unified, I'm not sure it's worth it.
>>> Of course, if we leave the datapaths separated, I'd like to do that later.
>>>
>>> What do you think?
>>
>> For the driver it self, it should not be hard. But I think you mean the
>> issue of e.g virtio_vsock_pkt itself which doesn't support sg. For short
>> time, maybe we can use kvec instead.
> I'll try to use kvec in the virtio_vsock_pkt.
>
> Since this struct is shared also with the host driver (vhost-vsock),
> I hope the changes could be limited, otherwise we can remove the last 2
> patches of the series for now, leaving the RX buffer size to 4KB.


Yes and if it introduces too much changes, maybe we can do the 64KB 
buffer in the future with the conversion of using skb where supports 
page frag natively.

Thanks


>
> Thanks,
> Stefano

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 7/8] vsock/virtio: increase RX buffer size to 64 KiB
  2019-05-14 16:20         ` Stefano Garzarella
  2019-05-15  2:50           ` Jason Wang
@ 2019-05-15  2:50           ` Jason Wang
  1 sibling, 0 replies; 75+ messages in thread
From: Jason Wang @ 2019-05-15  2:50 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: kvm, Michael S. Tsirkin, netdev, linux-kernel, virtualization,
	Stefan Hajnoczi, David S. Miller


On 2019/5/15 上午12:20, Stefano Garzarella wrote:
> On Tue, May 14, 2019 at 11:38:05AM +0800, Jason Wang wrote:
>> On 2019/5/14 上午1:51, Stefano Garzarella wrote:
>>> On Mon, May 13, 2019 at 06:01:52PM +0800, Jason Wang wrote:
>>>> On 2019/5/10 下午8:58, Stefano Garzarella wrote:
>>>>> In order to increase host -> guest throughput with large packets,
>>>>> we can use 64 KiB RX buffers.
>>>>>
>>>>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>>>>> ---
>>>>>     include/linux/virtio_vsock.h | 2 +-
>>>>>     1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>>>>> index 84b72026d327..5a9d25be72df 100644
>>>>> --- a/include/linux/virtio_vsock.h
>>>>> +++ b/include/linux/virtio_vsock.h
>>>>> @@ -10,7 +10,7 @@
>>>>>     #define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE	128
>>>>>     #define VIRTIO_VSOCK_DEFAULT_BUF_SIZE		(1024 * 256)
>>>>>     #define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE	(1024 * 256)
>>>>> -#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 4)
>>>>> +#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 64)
>>>>>     #define VIRTIO_VSOCK_MAX_BUF_SIZE		0xFFFFFFFFUL
>>>>>     #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE		(1024 * 64)
>>>> We probably don't want such high order allocation. It's better to switch to
>>>> use order 0 pages in this case. See add_recvbuf_big() for virtio-net. If we
>>>> get datapath unified, we will get more stuffs set.
>>> IIUC, you are suggesting to allocate only pages and put them in a
>>> scatterlist, then add them to the virtqueue.
>>>
>>> Is it correct?
>>
>> Yes since you are using:
>>
>>                  pkt->buf = kmalloc(buf_len, GFP_KERNEL);
>>                  if (!pkt->buf) {
>>                          virtio_transport_free_pkt(pkt);
>>                          break;
>>                  }
>>
>> This is likely to fail when the memory is fragmented which is kind of
>> fragile.
>>
>>
> Thanks for pointing that out.
>
>>> The issue that I have here, is that the virtio-vsock guest driver, see
>>> virtio_vsock_rx_fill(), allocates a struct virtio_vsock_pkt that
>>> contains the room for the header, then allocates the buffer for the payload.
>>> At this point it fills the scatterlist with the &virtio_vsock_pkt.hdr and the
>>> buffer for the payload.
>>
>> This part should be fine since what is needed is just adding more pages to
>> sg[] and call virtuqeueu_add_sg().
>>
>>
> Yes, I agree.
>
>>> Changing this will require several modifications, and if we get datapath
>>> unified, I'm not sure it's worth it.
>>> Of course, if we leave the datapaths separated, I'd like to do that later.
>>>
>>> What do you think?
>>
>> For the driver it self, it should not be hard. But I think you mean the
>> issue of e.g virtio_vsock_pkt itself which doesn't support sg. For short
>> time, maybe we can use kvec instead.
> I'll try to use kvec in the virtio_vsock_pkt.
>
> Since this struct is shared also with the host driver (vhost-vsock),
> I hope the changes could be limited, otherwise we can remove the last 2
> patches of the series for now, leaving the RX buffer size to 4KB.


Yes and if it introduces too much changes, maybe we can do the 64KB 
buffer in the future with the conversion of using skb where supports 
page frag natively.

Thanks


>
> Thanks,
> Stefano
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 7/8] vsock/virtio: increase RX buffer size to 64 KiB
  2019-05-15  2:50           ` Jason Wang
  2019-05-15  8:22             ` Stefano Garzarella
@ 2019-05-15  8:22             ` Stefano Garzarella
  1 sibling, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-15  8:22 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Stefan Hajnoczi

On Wed, May 15, 2019 at 10:50:43AM +0800, Jason Wang wrote:
> 
> On 2019/5/15 上午12:20, Stefano Garzarella wrote:
> > On Tue, May 14, 2019 at 11:38:05AM +0800, Jason Wang wrote:
> > > On 2019/5/14 上午1:51, Stefano Garzarella wrote:
> > > > On Mon, May 13, 2019 at 06:01:52PM +0800, Jason Wang wrote:
> > > > > On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> > > > > > In order to increase host -> guest throughput with large packets,
> > > > > > we can use 64 KiB RX buffers.
> > > > > > 
> > > > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > > > > ---
> > > > > >     include/linux/virtio_vsock.h | 2 +-
> > > > > >     1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > > > > > index 84b72026d327..5a9d25be72df 100644
> > > > > > --- a/include/linux/virtio_vsock.h
> > > > > > +++ b/include/linux/virtio_vsock.h
> > > > > > @@ -10,7 +10,7 @@
> > > > > >     #define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE	128
> > > > > >     #define VIRTIO_VSOCK_DEFAULT_BUF_SIZE		(1024 * 256)
> > > > > >     #define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE	(1024 * 256)
> > > > > > -#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 4)
> > > > > > +#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 64)
> > > > > >     #define VIRTIO_VSOCK_MAX_BUF_SIZE		0xFFFFFFFFUL
> > > > > >     #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE		(1024 * 64)
> > > > > We probably don't want such high order allocation. It's better to switch to
> > > > > use order 0 pages in this case. See add_recvbuf_big() for virtio-net. If we
> > > > > get datapath unified, we will get more stuffs set.
> > > > IIUC, you are suggesting to allocate only pages and put them in a
> > > > scatterlist, then add them to the virtqueue.
> > > > 
> > > > Is it correct?
> > > 
> > > Yes since you are using:
> > > 
> > >                  pkt->buf = kmalloc(buf_len, GFP_KERNEL);
> > >                  if (!pkt->buf) {
> > >                          virtio_transport_free_pkt(pkt);
> > >                          break;
> > >                  }
> > > 
> > > This is likely to fail when the memory is fragmented which is kind of
> > > fragile.
> > > 
> > > 
> > Thanks for pointing that out.
> > 
> > > > The issue that I have here, is that the virtio-vsock guest driver, see
> > > > virtio_vsock_rx_fill(), allocates a struct virtio_vsock_pkt that
> > > > contains the room for the header, then allocates the buffer for the payload.
> > > > At this point it fills the scatterlist with the &virtio_vsock_pkt.hdr and the
> > > > buffer for the payload.
> > > 
> > > This part should be fine since what is needed is just adding more pages to
> > > sg[] and call virtuqeueu_add_sg().
> > > 
> > > 
> > Yes, I agree.
> > 
> > > > Changing this will require several modifications, and if we get datapath
> > > > unified, I'm not sure it's worth it.
> > > > Of course, if we leave the datapaths separated, I'd like to do that later.
> > > > 
> > > > What do you think?
> > > 
> > > For the driver it self, it should not be hard. But I think you mean the
> > > issue of e.g virtio_vsock_pkt itself which doesn't support sg. For short
> > > time, maybe we can use kvec instead.
> > I'll try to use kvec in the virtio_vsock_pkt.
> > 
> > Since this struct is shared also with the host driver (vhost-vsock),
> > I hope the changes could be limited, otherwise we can remove the last 2
> > patches of the series for now, leaving the RX buffer size to 4KB.
> 
> 
> Yes and if it introduces too much changes, maybe we can do the 64KB buffer
> in the future with the conversion of using skb where supports page frag
> natively.

Yes, I completely agree!

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 7/8] vsock/virtio: increase RX buffer size to 64 KiB
  2019-05-15  2:50           ` Jason Wang
@ 2019-05-15  8:22             ` Stefano Garzarella
  2019-05-15  8:22             ` Stefano Garzarella
  1 sibling, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-15  8:22 UTC (permalink / raw)
  To: Jason Wang
  Cc: kvm, Michael S. Tsirkin, netdev, linux-kernel, virtualization,
	Stefan Hajnoczi, David S. Miller

On Wed, May 15, 2019 at 10:50:43AM +0800, Jason Wang wrote:
> 
> On 2019/5/15 上午12:20, Stefano Garzarella wrote:
> > On Tue, May 14, 2019 at 11:38:05AM +0800, Jason Wang wrote:
> > > On 2019/5/14 上午1:51, Stefano Garzarella wrote:
> > > > On Mon, May 13, 2019 at 06:01:52PM +0800, Jason Wang wrote:
> > > > > On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> > > > > > In order to increase host -> guest throughput with large packets,
> > > > > > we can use 64 KiB RX buffers.
> > > > > > 
> > > > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > > > > ---
> > > > > >     include/linux/virtio_vsock.h | 2 +-
> > > > > >     1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > > > > > index 84b72026d327..5a9d25be72df 100644
> > > > > > --- a/include/linux/virtio_vsock.h
> > > > > > +++ b/include/linux/virtio_vsock.h
> > > > > > @@ -10,7 +10,7 @@
> > > > > >     #define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE	128
> > > > > >     #define VIRTIO_VSOCK_DEFAULT_BUF_SIZE		(1024 * 256)
> > > > > >     #define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE	(1024 * 256)
> > > > > > -#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 4)
> > > > > > +#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 64)
> > > > > >     #define VIRTIO_VSOCK_MAX_BUF_SIZE		0xFFFFFFFFUL
> > > > > >     #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE		(1024 * 64)
> > > > > We probably don't want such high order allocation. It's better to switch to
> > > > > use order 0 pages in this case. See add_recvbuf_big() for virtio-net. If we
> > > > > get datapath unified, we will get more stuffs set.
> > > > IIUC, you are suggesting to allocate only pages and put them in a
> > > > scatterlist, then add them to the virtqueue.
> > > > 
> > > > Is it correct?
> > > 
> > > Yes since you are using:
> > > 
> > >                  pkt->buf = kmalloc(buf_len, GFP_KERNEL);
> > >                  if (!pkt->buf) {
> > >                          virtio_transport_free_pkt(pkt);
> > >                          break;
> > >                  }
> > > 
> > > This is likely to fail when the memory is fragmented which is kind of
> > > fragile.
> > > 
> > > 
> > Thanks for pointing that out.
> > 
> > > > The issue that I have here, is that the virtio-vsock guest driver, see
> > > > virtio_vsock_rx_fill(), allocates a struct virtio_vsock_pkt that
> > > > contains the room for the header, then allocates the buffer for the payload.
> > > > At this point it fills the scatterlist with the &virtio_vsock_pkt.hdr and the
> > > > buffer for the payload.
> > > 
> > > This part should be fine since what is needed is just adding more pages to
> > > sg[] and call virtuqeueu_add_sg().
> > > 
> > > 
> > Yes, I agree.
> > 
> > > > Changing this will require several modifications, and if we get datapath
> > > > unified, I'm not sure it's worth it.
> > > > Of course, if we leave the datapaths separated, I'd like to do that later.
> > > > 
> > > > What do you think?
> > > 
> > > For the driver it self, it should not be hard. But I think you mean the
> > > issue of e.g virtio_vsock_pkt itself which doesn't support sg. For short
> > > time, maybe we can use kvec instead.
> > I'll try to use kvec in the virtio_vsock_pkt.
> > 
> > Since this struct is shared also with the host driver (vhost-vsock),
> > I hope the changes could be limited, otherwise we can remove the last 2
> > patches of the series for now, leaving the RX buffer size to 4KB.
> 
> 
> Yes and if it introduces too much changes, maybe we can do the 64KB buffer
> in the future with the conversion of using skb where supports page frag
> natively.

Yes, I completely agree!

Thanks,
Stefano
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket
  2019-05-10 12:58 ` Stefano Garzarella
                     ` (2 preceding siblings ...)
  2019-05-13  9:58   ` Jason Wang
@ 2019-05-16 15:25   ` Stefan Hajnoczi
  2019-05-17  8:25     ` Stefano Garzarella
  2019-05-17  8:25     ` Stefano Garzarella
  2019-05-16 15:25   ` Stefan Hajnoczi
  4 siblings, 2 replies; 75+ messages in thread
From: Stefan Hajnoczi @ 2019-05-16 15:25 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Jason Wang

[-- Attachment #1: Type: text/plain, Size: 2164 bytes --]

On Fri, May 10, 2019 at 02:58:36PM +0200, Stefano Garzarella wrote:
> +struct virtio_vsock_buf {

Please add a comment describing the purpose of this struct and to
differentiate its use from struct virtio_vsock_pkt.

> +static struct virtio_vsock_buf *
> +virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
> +{
> +	struct virtio_vsock_buf *buf;
> +
> +	if (pkt->len == 0)
> +		return NULL;
> +
> +	buf = kzalloc(sizeof(*buf), GFP_KERNEL);
> +	if (!buf)
> +		return NULL;
> +
> +	/* If the buffer in the virtio_vsock_pkt is full, we can move it to
> +	 * the new virtio_vsock_buf avoiding the copy, because we are sure that
> +	 * we are not use more memory than that counted by the credit mechanism.
> +	 */
> +	if (zero_copy && pkt->len == pkt->buf_len) {
> +		buf->addr = pkt->buf;
> +		pkt->buf = NULL;
> +	} else {
> +		buf->addr = kmalloc(pkt->len, GFP_KERNEL);

buf and buf->addr could be allocated in a single call, though I'm not
sure how big an optimization this is.

> @@ -841,20 +882,24 @@ virtio_transport_recv_connected(struct sock *sk,
>  {
>  	struct vsock_sock *vsk = vsock_sk(sk);
>  	struct virtio_vsock_sock *vvs = vsk->trans;
> +	struct virtio_vsock_buf *buf;
>  	int err = 0;
>  
>  	switch (le16_to_cpu(pkt->hdr.op)) {
>  	case VIRTIO_VSOCK_OP_RW:
>  		pkt->len = le32_to_cpu(pkt->hdr.len);
> -		pkt->off = 0;
> +		buf = virtio_transport_alloc_buf(pkt, true);
>  
> -		spin_lock_bh(&vvs->rx_lock);
> -		virtio_transport_inc_rx_pkt(vvs, pkt);
> -		list_add_tail(&pkt->list, &vvs->rx_queue);
> -		spin_unlock_bh(&vvs->rx_lock);
> +		if (buf) {
> +			spin_lock_bh(&vvs->rx_lock);
> +			virtio_transport_inc_rx_pkt(vvs, pkt->len);
> +			list_add_tail(&buf->list, &vvs->rx_queue);
> +			spin_unlock_bh(&vvs->rx_lock);
>  
> -		sk->sk_data_ready(sk);
> -		return err;
> +			sk->sk_data_ready(sk);
> +		}

The return value of this function isn't used but the code still makes an
effort to return errors.  Please return -ENOMEM when buf == NULL.

If you'd like to remove the return value that's fine too, but please do
it for the whole function to be consistent.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket
  2019-05-10 12:58 ` Stefano Garzarella
                     ` (3 preceding siblings ...)
  2019-05-16 15:25   ` Stefan Hajnoczi
@ 2019-05-16 15:25   ` Stefan Hajnoczi
  4 siblings, 0 replies; 75+ messages in thread
From: Stefan Hajnoczi @ 2019-05-16 15:25 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: kvm, Michael S. Tsirkin, netdev, linux-kernel, virtualization,
	David S. Miller


[-- Attachment #1.1: Type: text/plain, Size: 2164 bytes --]

On Fri, May 10, 2019 at 02:58:36PM +0200, Stefano Garzarella wrote:
> +struct virtio_vsock_buf {

Please add a comment describing the purpose of this struct and to
differentiate its use from struct virtio_vsock_pkt.

> +static struct virtio_vsock_buf *
> +virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
> +{
> +	struct virtio_vsock_buf *buf;
> +
> +	if (pkt->len == 0)
> +		return NULL;
> +
> +	buf = kzalloc(sizeof(*buf), GFP_KERNEL);
> +	if (!buf)
> +		return NULL;
> +
> +	/* If the buffer in the virtio_vsock_pkt is full, we can move it to
> +	 * the new virtio_vsock_buf avoiding the copy, because we are sure that
> +	 * we are not use more memory than that counted by the credit mechanism.
> +	 */
> +	if (zero_copy && pkt->len == pkt->buf_len) {
> +		buf->addr = pkt->buf;
> +		pkt->buf = NULL;
> +	} else {
> +		buf->addr = kmalloc(pkt->len, GFP_KERNEL);

buf and buf->addr could be allocated in a single call, though I'm not
sure how big an optimization this is.

> @@ -841,20 +882,24 @@ virtio_transport_recv_connected(struct sock *sk,
>  {
>  	struct vsock_sock *vsk = vsock_sk(sk);
>  	struct virtio_vsock_sock *vvs = vsk->trans;
> +	struct virtio_vsock_buf *buf;
>  	int err = 0;
>  
>  	switch (le16_to_cpu(pkt->hdr.op)) {
>  	case VIRTIO_VSOCK_OP_RW:
>  		pkt->len = le32_to_cpu(pkt->hdr.len);
> -		pkt->off = 0;
> +		buf = virtio_transport_alloc_buf(pkt, true);
>  
> -		spin_lock_bh(&vvs->rx_lock);
> -		virtio_transport_inc_rx_pkt(vvs, pkt);
> -		list_add_tail(&pkt->list, &vvs->rx_queue);
> -		spin_unlock_bh(&vvs->rx_lock);
> +		if (buf) {
> +			spin_lock_bh(&vvs->rx_lock);
> +			virtio_transport_inc_rx_pkt(vvs, pkt->len);
> +			list_add_tail(&buf->list, &vvs->rx_queue);
> +			spin_unlock_bh(&vvs->rx_lock);
>  
> -		sk->sk_data_ready(sk);
> -		return err;
> +			sk->sk_data_ready(sk);
> +		}

The return value of this function isn't used but the code still makes an
effort to return errors.  Please return -ENOMEM when buf == NULL.

If you'd like to remove the return value that's fine too, but please do
it for the whole function to be consistent.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 2/8] vsock/virtio: free packets during the socket release
  2019-05-10 12:58 ` [PATCH v2 2/8] vsock/virtio: free packets during the socket release Stefano Garzarella
                     ` (2 preceding siblings ...)
  2019-05-16 15:32   ` Stefan Hajnoczi
@ 2019-05-16 15:32   ` Stefan Hajnoczi
  2019-05-17  8:26     ` Stefano Garzarella
  2019-05-17  8:26     ` Stefano Garzarella
  3 siblings, 2 replies; 75+ messages in thread
From: Stefan Hajnoczi @ 2019-05-16 15:32 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Jason Wang

[-- Attachment #1: Type: text/plain, Size: 484 bytes --]

On Fri, May 10, 2019 at 02:58:37PM +0200, Stefano Garzarella wrote:
> When the socket is released, we should free all packets
> queued in the per-socket list in order to avoid a memory
> leak.
> 
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  net/vmw_vsock/virtio_transport_common.c | 8 ++++++++
>  1 file changed, 8 insertions(+)

Ouch, this would be nice as a separate patch that can be merged right
away (with s/virtio_vsock_buf/virtio_vsock_pkt/).

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 2/8] vsock/virtio: free packets during the socket release
  2019-05-10 12:58 ` [PATCH v2 2/8] vsock/virtio: free packets during the socket release Stefano Garzarella
  2019-05-10 22:20   ` David Miller
  2019-05-10 22:20   ` David Miller
@ 2019-05-16 15:32   ` Stefan Hajnoczi
  2019-05-16 15:32   ` Stefan Hajnoczi
  3 siblings, 0 replies; 75+ messages in thread
From: Stefan Hajnoczi @ 2019-05-16 15:32 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: kvm, Michael S. Tsirkin, netdev, linux-kernel, virtualization,
	David S. Miller


[-- Attachment #1.1: Type: text/plain, Size: 484 bytes --]

On Fri, May 10, 2019 at 02:58:37PM +0200, Stefano Garzarella wrote:
> When the socket is released, we should free all packets
> queued in the per-socket list in order to avoid a memory
> leak.
> 
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  net/vmw_vsock/virtio_transport_common.c | 8 ++++++++
>  1 file changed, 8 insertions(+)

Ouch, this would be nice as a separate patch that can be merged right
away (with s/virtio_vsock_buf/virtio_vsock_pkt/).

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket
  2019-05-16 15:25   ` Stefan Hajnoczi
  2019-05-17  8:25     ` Stefano Garzarella
@ 2019-05-17  8:25     ` Stefano Garzarella
  2019-05-20  8:57       ` Stefan Hajnoczi
  2019-05-20  8:57       ` Stefan Hajnoczi
  1 sibling, 2 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-17  8:25 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: netdev, David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Jason Wang

On Thu, May 16, 2019 at 04:25:33PM +0100, Stefan Hajnoczi wrote:
> On Fri, May 10, 2019 at 02:58:36PM +0200, Stefano Garzarella wrote:
> > +struct virtio_vsock_buf {
> 
> Please add a comment describing the purpose of this struct and to
> differentiate its use from struct virtio_vsock_pkt.
> 

Sure, I'll fix it.

> > +static struct virtio_vsock_buf *
> > +virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
> > +{
> > +	struct virtio_vsock_buf *buf;
> > +
> > +	if (pkt->len == 0)
> > +		return NULL;
> > +
> > +	buf = kzalloc(sizeof(*buf), GFP_KERNEL);
> > +	if (!buf)
> > +		return NULL;
> > +
> > +	/* If the buffer in the virtio_vsock_pkt is full, we can move it to
> > +	 * the new virtio_vsock_buf avoiding the copy, because we are sure that
> > +	 * we are not use more memory than that counted by the credit mechanism.
> > +	 */
> > +	if (zero_copy && pkt->len == pkt->buf_len) {
> > +		buf->addr = pkt->buf;
> > +		pkt->buf = NULL;
> > +	} else {
> > +		buf->addr = kmalloc(pkt->len, GFP_KERNEL);
> 
> buf and buf->addr could be allocated in a single call, though I'm not
> sure how big an optimization this is.
> 

IIUC, in the case of zero-copy I should allocate only the buf,
otherwise I should allocate both buf and buf->addr in a single call
when I'm doing a full-copy.

Is it correct?

> > @@ -841,20 +882,24 @@ virtio_transport_recv_connected(struct sock *sk,
> >  {
> >  	struct vsock_sock *vsk = vsock_sk(sk);
> >  	struct virtio_vsock_sock *vvs = vsk->trans;
> > +	struct virtio_vsock_buf *buf;
> >  	int err = 0;
> >  
> >  	switch (le16_to_cpu(pkt->hdr.op)) {
> >  	case VIRTIO_VSOCK_OP_RW:
> >  		pkt->len = le32_to_cpu(pkt->hdr.len);
> > -		pkt->off = 0;
> > +		buf = virtio_transport_alloc_buf(pkt, true);
> >  
> > -		spin_lock_bh(&vvs->rx_lock);
> > -		virtio_transport_inc_rx_pkt(vvs, pkt);
> > -		list_add_tail(&pkt->list, &vvs->rx_queue);
> > -		spin_unlock_bh(&vvs->rx_lock);
> > +		if (buf) {
> > +			spin_lock_bh(&vvs->rx_lock);
> > +			virtio_transport_inc_rx_pkt(vvs, pkt->len);
> > +			list_add_tail(&buf->list, &vvs->rx_queue);
> > +			spin_unlock_bh(&vvs->rx_lock);
> >  
> > -		sk->sk_data_ready(sk);
> > -		return err;
> > +			sk->sk_data_ready(sk);
> > +		}
> 
> The return value of this function isn't used but the code still makes an
> effort to return errors.  Please return -ENOMEM when buf == NULL.
> 
> If you'd like to remove the return value that's fine too, but please do
> it for the whole function to be consistent.

I'll return -ENOMEM when the allocation fails.

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket
  2019-05-16 15:25   ` Stefan Hajnoczi
@ 2019-05-17  8:25     ` Stefano Garzarella
  2019-05-17  8:25     ` Stefano Garzarella
  1 sibling, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-17  8:25 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kvm, Michael S. Tsirkin, netdev, linux-kernel, virtualization,
	David S. Miller

On Thu, May 16, 2019 at 04:25:33PM +0100, Stefan Hajnoczi wrote:
> On Fri, May 10, 2019 at 02:58:36PM +0200, Stefano Garzarella wrote:
> > +struct virtio_vsock_buf {
> 
> Please add a comment describing the purpose of this struct and to
> differentiate its use from struct virtio_vsock_pkt.
> 

Sure, I'll fix it.

> > +static struct virtio_vsock_buf *
> > +virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
> > +{
> > +	struct virtio_vsock_buf *buf;
> > +
> > +	if (pkt->len == 0)
> > +		return NULL;
> > +
> > +	buf = kzalloc(sizeof(*buf), GFP_KERNEL);
> > +	if (!buf)
> > +		return NULL;
> > +
> > +	/* If the buffer in the virtio_vsock_pkt is full, we can move it to
> > +	 * the new virtio_vsock_buf avoiding the copy, because we are sure that
> > +	 * we are not use more memory than that counted by the credit mechanism.
> > +	 */
> > +	if (zero_copy && pkt->len == pkt->buf_len) {
> > +		buf->addr = pkt->buf;
> > +		pkt->buf = NULL;
> > +	} else {
> > +		buf->addr = kmalloc(pkt->len, GFP_KERNEL);
> 
> buf and buf->addr could be allocated in a single call, though I'm not
> sure how big an optimization this is.
> 

IIUC, in the case of zero-copy I should allocate only the buf,
otherwise I should allocate both buf and buf->addr in a single call
when I'm doing a full-copy.

Is it correct?

> > @@ -841,20 +882,24 @@ virtio_transport_recv_connected(struct sock *sk,
> >  {
> >  	struct vsock_sock *vsk = vsock_sk(sk);
> >  	struct virtio_vsock_sock *vvs = vsk->trans;
> > +	struct virtio_vsock_buf *buf;
> >  	int err = 0;
> >  
> >  	switch (le16_to_cpu(pkt->hdr.op)) {
> >  	case VIRTIO_VSOCK_OP_RW:
> >  		pkt->len = le32_to_cpu(pkt->hdr.len);
> > -		pkt->off = 0;
> > +		buf = virtio_transport_alloc_buf(pkt, true);
> >  
> > -		spin_lock_bh(&vvs->rx_lock);
> > -		virtio_transport_inc_rx_pkt(vvs, pkt);
> > -		list_add_tail(&pkt->list, &vvs->rx_queue);
> > -		spin_unlock_bh(&vvs->rx_lock);
> > +		if (buf) {
> > +			spin_lock_bh(&vvs->rx_lock);
> > +			virtio_transport_inc_rx_pkt(vvs, pkt->len);
> > +			list_add_tail(&buf->list, &vvs->rx_queue);
> > +			spin_unlock_bh(&vvs->rx_lock);
> >  
> > -		sk->sk_data_ready(sk);
> > -		return err;
> > +			sk->sk_data_ready(sk);
> > +		}
> 
> The return value of this function isn't used but the code still makes an
> effort to return errors.  Please return -ENOMEM when buf == NULL.
> 
> If you'd like to remove the return value that's fine too, but please do
> it for the whole function to be consistent.

I'll return -ENOMEM when the allocation fails.

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 2/8] vsock/virtio: free packets during the socket release
  2019-05-16 15:32   ` Stefan Hajnoczi
@ 2019-05-17  8:26     ` Stefano Garzarella
  2019-05-17  8:26     ` Stefano Garzarella
  1 sibling, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-17  8:26 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: netdev, David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Jason Wang

On Thu, May 16, 2019 at 04:32:18PM +0100, Stefan Hajnoczi wrote:
> On Fri, May 10, 2019 at 02:58:37PM +0200, Stefano Garzarella wrote:
> > When the socket is released, we should free all packets
> > queued in the per-socket list in order to avoid a memory
> > leak.
> > 
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > ---
> >  net/vmw_vsock/virtio_transport_common.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> 
> Ouch, this would be nice as a separate patch that can be merged right
> away (with s/virtio_vsock_buf/virtio_vsock_pkt/).

Okay, I'll fix this patch following the David's comment and I'll send
as a separate patch using the virtio_vsock_pkt.

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 2/8] vsock/virtio: free packets during the socket release
  2019-05-16 15:32   ` Stefan Hajnoczi
  2019-05-17  8:26     ` Stefano Garzarella
@ 2019-05-17  8:26     ` Stefano Garzarella
  1 sibling, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-17  8:26 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kvm, Michael S. Tsirkin, netdev, linux-kernel, virtualization,
	David S. Miller

On Thu, May 16, 2019 at 04:32:18PM +0100, Stefan Hajnoczi wrote:
> On Fri, May 10, 2019 at 02:58:37PM +0200, Stefano Garzarella wrote:
> > When the socket is released, we should free all packets
> > queued in the per-socket list in order to avoid a memory
> > leak.
> > 
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > ---
> >  net/vmw_vsock/virtio_transport_common.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> 
> Ouch, this would be nice as a separate patch that can be merged right
> away (with s/virtio_vsock_buf/virtio_vsock_pkt/).

Okay, I'll fix this patch following the David's comment and I'll send
as a separate patch using the virtio_vsock_pkt.

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket
  2019-05-17  8:25     ` Stefano Garzarella
  2019-05-20  8:57       ` Stefan Hajnoczi
@ 2019-05-20  8:57       ` Stefan Hajnoczi
  1 sibling, 0 replies; 75+ messages in thread
From: Stefan Hajnoczi @ 2019-05-20  8:57 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Jason Wang

[-- Attachment #1: Type: text/plain, Size: 1505 bytes --]

On Fri, May 17, 2019 at 10:25:05AM +0200, Stefano Garzarella wrote:
> On Thu, May 16, 2019 at 04:25:33PM +0100, Stefan Hajnoczi wrote:
> > On Fri, May 10, 2019 at 02:58:36PM +0200, Stefano Garzarella wrote:
> > > +static struct virtio_vsock_buf *
> > > +virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
> > > +{
> > > +	struct virtio_vsock_buf *buf;
> > > +
> > > +	if (pkt->len == 0)
> > > +		return NULL;
> > > +
> > > +	buf = kzalloc(sizeof(*buf), GFP_KERNEL);
> > > +	if (!buf)
> > > +		return NULL;
> > > +
> > > +	/* If the buffer in the virtio_vsock_pkt is full, we can move it to
> > > +	 * the new virtio_vsock_buf avoiding the copy, because we are sure that
> > > +	 * we are not use more memory than that counted by the credit mechanism.
> > > +	 */
> > > +	if (zero_copy && pkt->len == pkt->buf_len) {
> > > +		buf->addr = pkt->buf;
> > > +		pkt->buf = NULL;
> > > +	} else {
> > > +		buf->addr = kmalloc(pkt->len, GFP_KERNEL);
> > 
> > buf and buf->addr could be allocated in a single call, though I'm not
> > sure how big an optimization this is.
> > 
> 
> IIUC, in the case of zero-copy I should allocate only the buf,
> otherwise I should allocate both buf and buf->addr in a single call
> when I'm doing a full-copy.
> 
> Is it correct?

Yes, but it's your choice whether optimization is worthwhile.  If it
increases the complexity of the code and doesn't result in a measurable
improvement, then it's not worth it.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket
  2019-05-17  8:25     ` Stefano Garzarella
@ 2019-05-20  8:57       ` Stefan Hajnoczi
  2019-05-20  8:57       ` Stefan Hajnoczi
  1 sibling, 0 replies; 75+ messages in thread
From: Stefan Hajnoczi @ 2019-05-20  8:57 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: kvm, Michael S. Tsirkin, netdev, linux-kernel, virtualization,
	David S. Miller


[-- Attachment #1.1: Type: text/plain, Size: 1505 bytes --]

On Fri, May 17, 2019 at 10:25:05AM +0200, Stefano Garzarella wrote:
> On Thu, May 16, 2019 at 04:25:33PM +0100, Stefan Hajnoczi wrote:
> > On Fri, May 10, 2019 at 02:58:36PM +0200, Stefano Garzarella wrote:
> > > +static struct virtio_vsock_buf *
> > > +virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
> > > +{
> > > +	struct virtio_vsock_buf *buf;
> > > +
> > > +	if (pkt->len == 0)
> > > +		return NULL;
> > > +
> > > +	buf = kzalloc(sizeof(*buf), GFP_KERNEL);
> > > +	if (!buf)
> > > +		return NULL;
> > > +
> > > +	/* If the buffer in the virtio_vsock_pkt is full, we can move it to
> > > +	 * the new virtio_vsock_buf avoiding the copy, because we are sure that
> > > +	 * we are not use more memory than that counted by the credit mechanism.
> > > +	 */
> > > +	if (zero_copy && pkt->len == pkt->buf_len) {
> > > +		buf->addr = pkt->buf;
> > > +		pkt->buf = NULL;
> > > +	} else {
> > > +		buf->addr = kmalloc(pkt->len, GFP_KERNEL);
> > 
> > buf and buf->addr could be allocated in a single call, though I'm not
> > sure how big an optimization this is.
> > 
> 
> IIUC, in the case of zero-copy I should allocate only the buf,
> otherwise I should allocate both buf and buf->addr in a single call
> when I'm doing a full-copy.
> 
> Is it correct?

Yes, but it's your choice whether optimization is worthwhile.  If it
increases the complexity of the code and doesn't result in a measurable
improvement, then it's not worth it.

Stefan

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput
  2019-05-13  9:33 ` [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput Jason Wang
  2019-05-13 16:49   ` Stefano Garzarella
  2019-05-13 16:49   ` Stefano Garzarella
@ 2019-05-20 14:09   ` Stefano Garzarella
  2019-05-20 14:09   ` Stefano Garzarella
  3 siblings, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-20 14:09 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm, Stefan Hajnoczi

On Mon, May 13, 2019 at 05:33:40PM +0800, Jason Wang wrote:
> 
> On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> > While I was testing this new series (v2) I discovered an huge use of memory
> > and a memory leak in the virtio-vsock driver in the guest when I sent
> > 1-byte packets to the guest.
> > 
> > These issues are present since the introduction of the virtio-vsock
> > driver. I added the patches 1 and 2 to fix them in this series in order
> > to better track the performance trends.
> > 
> > v1: https://patchwork.kernel.org/cover/10885431/
> > 
> > v2:
> > - Add patch 1 to limit the memory usage
> > - Add patch 2 to avoid memory leak during the socket release
> > - Add patch 3 to fix locking of fwd_cnt and buf_alloc
> > - Patch 4: fix 'free_space' type (u32 instead of s64) [Stefan]
> > - Patch 5: Avoid integer underflow of iov_len [Stefan]
> > - Patch 5: Fix packet capture in order to see the exact packets that are
> >             delivered. [Stefan]
> > - Add patch 8 to make the RX buffer size tunable [Stefan]
> > 
> > Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> > support.
> > As Micheal suggested in the v1, I booted host and guest with 'nosmap', and I
> > added a column with virtio-net+vhost-net performance.
> > 
> > A brief description of patches:
> > - Patches 1+2: limit the memory usage with an extra copy and avoid memory leak
> > - Patches 3+4: fix locking and reduce the number of credit update messages sent
> >                 to the transmitter
> > - Patches 5+6: allow the host to split packets on multiple buffers and use
> >                 VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
> > - Patches 7+8: increase RX buffer size to 64 KiB
> > 
> >                      host -> guest [Gbps]
> > pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
> >                                                                       TCP_NODELAY
> > 64         0.068     0.063    0.130    0.131    0.128         0.188     0.187
> > 256        0.274     0.236    0.392    0.338    0.282         0.749     0.654
> > 512        0.531     0.457    0.862    0.725    0.602         1.419     1.414
> > 1K         0.954     0.827    1.591    1.598    1.548         2.599     2.640
> > 2K         1.783     1.543    3.731    3.637    3.469         4.530     4.754
> > 4K         3.332     3.436    7.164    7.124    6.494         7.738     7.696
> > 8K         5.792     5.530   11.653   11.787   11.444        12.307    11.850
> > 16K        8.405     8.462   16.372   16.855   17.562        16.936    16.954
> > 32K       14.208    13.669   18.945   20.009   23.128        21.980    23.015
> > 64K       21.082    18.893   20.266   20.903   30.622        27.290    27.383
> > 128K      20.696    20.148   20.112   21.746   32.152        30.446    30.990
> > 256K      20.801    20.589   20.725   22.685   34.721        33.151    32.745
> > 512K      21.220    20.465   20.432   22.106   34.496        36.847    31.096
> > 
> >                      guest -> host [Gbps]
> > pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
> >                                                                       TCP_NODELAY
> > 64         0.089     0.091    0.120    0.115    0.117         0.274     0.272
> > 256        0.352     0.354    0.452    0.445    0.451         1.085     1.136
> > 512        0.705     0.704    0.893    0.858    0.898         2.131     1.882
> > 1K         1.394     1.433    1.721    1.669    1.691         3.984     3.576
> > 2K         2.818     2.874    3.316    3.249    3.303         6.719     6.359
> > 4K         5.293     5.397    6.129    5.933    6.082        10.105     9.860
> > 8K         8.890     9.151   10.990   10.545   10.519        15.239    14.868
> > 16K       11.444    11.018   12.074   15.255   15.577        20.551    20.848
> > 32K       11.229    10.875   10.857   24.401   25.227        26.294    26.380
> > 64K       10.832    10.545   10.816   39.487   39.616        34.996    32.041
> > 128K      10.435    10.241   10.500   39.813   40.012        38.379    35.055
> > 256K      10.263     9.866    9.845   34.971   35.143        36.559    37.232
> > 512K      10.224    10.060   10.092   35.469   34.627        34.963    33.401
> > 
> > As Stefan suggested in the v1, this time I measured also the efficiency in this
> > way:
> >      efficiency = Mbps / (%CPU_Host + %CPU_Guest)
> > 
> > The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> > but it's provided for free from iperf3 and could be an indication.
> > 
> >          host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> > pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
> >                                                                       TCP_NODELAY
> > 64          0.94      0.59     3.96     4.06     4.09          2.82      2.11
> > 256         2.62      2.50     6.45     6.09     5.81          9.64      8.73
> > 512         5.16      4.87    13.16    12.39    11.67         17.83     17.76
> > 1K          9.16      8.85    24.98    24.97    25.01         32.57     32.04
> > 2K         17.41     17.03    49.09    48.59    49.22         55.31     57.14
> > 4K         32.99     33.62    90.80    90.98    91.72         91.79     91.40
> > 8K         58.51     59.98   153.53   170.83   167.31        137.51    132.85
> > 16K        89.32     95.29   216.98   264.18   260.95        176.05    176.05
> > 32K       152.94    167.10   285.75   387.02   360.81        215.49    226.30
> > 64K       250.38    307.20   317.65   489.53   472.70        238.97    244.27
> > 128K      327.99    335.24   335.76   523.71   486.41        253.29    260.86
> > 256K      327.06    334.24   338.64   533.76   509.85        267.78    266.22
> > 512K      337.36    330.61   334.95   512.90   496.35        280.42    241.43
> > 
> >          guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> > pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
> >                                                                       TCP_NODELAY
> > 64          0.90      0.91     1.37     1.32     1.35          2.15      2.13
> > 256         3.59      3.55     5.23     5.19     5.29          8.50      8.89
> > 512         7.19      7.08    10.21     9.95    10.38         16.74     14.71
> > 1K         14.15     14.34    19.85    19.06    19.33         31.44     28.11
> > 2K         28.44     29.09    37.78    37.18    37.49         53.07     50.63
> > 4K         55.37     57.60    71.02    69.27    70.97         81.56     79.32
> > 8K        105.58    100.45   111.95   124.68   123.61        120.85    118.66
> > 16K       141.63    138.24   137.67   187.41   190.20        160.43    163.00
> > 32K       147.56    143.09   138.48   296.41   301.04        214.64    223.94
> > 64K       144.81    143.27   138.49   433.98   462.26        298.86    269.71
> > 128K      150.14    147.99   146.85   511.36   514.29        350.17    298.09
> > 256K      156.69    152.25   148.69   542.19   549.97        326.42    333.32
> > 512K      157.29    153.35   152.22   546.52   533.24        315.55    302.27
> > 
> > [1] https://github.com/stefano-garzarella/iperf/
> 
> 
> Hi:
> 
> Do you have any explanation that vsock is better here? Is this because of
> the mergeable buffer? If you, we need test with mrg_rxbuf=off.
> 

Hi Jason,
I tried to disable the mergeable buffer but I had even worst performance
with virtio-net.

Do you think the differences could be related to the TCP/IP stack?

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput
  2019-05-13  9:33 ` [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput Jason Wang
                     ` (2 preceding siblings ...)
  2019-05-20 14:09   ` Stefano Garzarella
@ 2019-05-20 14:09   ` Stefano Garzarella
  3 siblings, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-20 14:09 UTC (permalink / raw)
  To: Jason Wang
  Cc: kvm, Michael S. Tsirkin, netdev, linux-kernel, virtualization,
	Stefan Hajnoczi, David S. Miller

On Mon, May 13, 2019 at 05:33:40PM +0800, Jason Wang wrote:
> 
> On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> > While I was testing this new series (v2) I discovered an huge use of memory
> > and a memory leak in the virtio-vsock driver in the guest when I sent
> > 1-byte packets to the guest.
> > 
> > These issues are present since the introduction of the virtio-vsock
> > driver. I added the patches 1 and 2 to fix them in this series in order
> > to better track the performance trends.
> > 
> > v1: https://patchwork.kernel.org/cover/10885431/
> > 
> > v2:
> > - Add patch 1 to limit the memory usage
> > - Add patch 2 to avoid memory leak during the socket release
> > - Add patch 3 to fix locking of fwd_cnt and buf_alloc
> > - Patch 4: fix 'free_space' type (u32 instead of s64) [Stefan]
> > - Patch 5: Avoid integer underflow of iov_len [Stefan]
> > - Patch 5: Fix packet capture in order to see the exact packets that are
> >             delivered. [Stefan]
> > - Add patch 8 to make the RX buffer size tunable [Stefan]
> > 
> > Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> > support.
> > As Micheal suggested in the v1, I booted host and guest with 'nosmap', and I
> > added a column with virtio-net+vhost-net performance.
> > 
> > A brief description of patches:
> > - Patches 1+2: limit the memory usage with an extra copy and avoid memory leak
> > - Patches 3+4: fix locking and reduce the number of credit update messages sent
> >                 to the transmitter
> > - Patches 5+6: allow the host to split packets on multiple buffers and use
> >                 VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
> > - Patches 7+8: increase RX buffer size to 64 KiB
> > 
> >                      host -> guest [Gbps]
> > pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
> >                                                                       TCP_NODELAY
> > 64         0.068     0.063    0.130    0.131    0.128         0.188     0.187
> > 256        0.274     0.236    0.392    0.338    0.282         0.749     0.654
> > 512        0.531     0.457    0.862    0.725    0.602         1.419     1.414
> > 1K         0.954     0.827    1.591    1.598    1.548         2.599     2.640
> > 2K         1.783     1.543    3.731    3.637    3.469         4.530     4.754
> > 4K         3.332     3.436    7.164    7.124    6.494         7.738     7.696
> > 8K         5.792     5.530   11.653   11.787   11.444        12.307    11.850
> > 16K        8.405     8.462   16.372   16.855   17.562        16.936    16.954
> > 32K       14.208    13.669   18.945   20.009   23.128        21.980    23.015
> > 64K       21.082    18.893   20.266   20.903   30.622        27.290    27.383
> > 128K      20.696    20.148   20.112   21.746   32.152        30.446    30.990
> > 256K      20.801    20.589   20.725   22.685   34.721        33.151    32.745
> > 512K      21.220    20.465   20.432   22.106   34.496        36.847    31.096
> > 
> >                      guest -> host [Gbps]
> > pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
> >                                                                       TCP_NODELAY
> > 64         0.089     0.091    0.120    0.115    0.117         0.274     0.272
> > 256        0.352     0.354    0.452    0.445    0.451         1.085     1.136
> > 512        0.705     0.704    0.893    0.858    0.898         2.131     1.882
> > 1K         1.394     1.433    1.721    1.669    1.691         3.984     3.576
> > 2K         2.818     2.874    3.316    3.249    3.303         6.719     6.359
> > 4K         5.293     5.397    6.129    5.933    6.082        10.105     9.860
> > 8K         8.890     9.151   10.990   10.545   10.519        15.239    14.868
> > 16K       11.444    11.018   12.074   15.255   15.577        20.551    20.848
> > 32K       11.229    10.875   10.857   24.401   25.227        26.294    26.380
> > 64K       10.832    10.545   10.816   39.487   39.616        34.996    32.041
> > 128K      10.435    10.241   10.500   39.813   40.012        38.379    35.055
> > 256K      10.263     9.866    9.845   34.971   35.143        36.559    37.232
> > 512K      10.224    10.060   10.092   35.469   34.627        34.963    33.401
> > 
> > As Stefan suggested in the v1, this time I measured also the efficiency in this
> > way:
> >      efficiency = Mbps / (%CPU_Host + %CPU_Guest)
> > 
> > The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> > but it's provided for free from iperf3 and could be an indication.
> > 
> >          host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> > pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
> >                                                                       TCP_NODELAY
> > 64          0.94      0.59     3.96     4.06     4.09          2.82      2.11
> > 256         2.62      2.50     6.45     6.09     5.81          9.64      8.73
> > 512         5.16      4.87    13.16    12.39    11.67         17.83     17.76
> > 1K          9.16      8.85    24.98    24.97    25.01         32.57     32.04
> > 2K         17.41     17.03    49.09    48.59    49.22         55.31     57.14
> > 4K         32.99     33.62    90.80    90.98    91.72         91.79     91.40
> > 8K         58.51     59.98   153.53   170.83   167.31        137.51    132.85
> > 16K        89.32     95.29   216.98   264.18   260.95        176.05    176.05
> > 32K       152.94    167.10   285.75   387.02   360.81        215.49    226.30
> > 64K       250.38    307.20   317.65   489.53   472.70        238.97    244.27
> > 128K      327.99    335.24   335.76   523.71   486.41        253.29    260.86
> > 256K      327.06    334.24   338.64   533.76   509.85        267.78    266.22
> > 512K      337.36    330.61   334.95   512.90   496.35        280.42    241.43
> > 
> >          guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> > pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
> >                                                                       TCP_NODELAY
> > 64          0.90      0.91     1.37     1.32     1.35          2.15      2.13
> > 256         3.59      3.55     5.23     5.19     5.29          8.50      8.89
> > 512         7.19      7.08    10.21     9.95    10.38         16.74     14.71
> > 1K         14.15     14.34    19.85    19.06    19.33         31.44     28.11
> > 2K         28.44     29.09    37.78    37.18    37.49         53.07     50.63
> > 4K         55.37     57.60    71.02    69.27    70.97         81.56     79.32
> > 8K        105.58    100.45   111.95   124.68   123.61        120.85    118.66
> > 16K       141.63    138.24   137.67   187.41   190.20        160.43    163.00
> > 32K       147.56    143.09   138.48   296.41   301.04        214.64    223.94
> > 64K       144.81    143.27   138.49   433.98   462.26        298.86    269.71
> > 128K      150.14    147.99   146.85   511.36   514.29        350.17    298.09
> > 256K      156.69    152.25   148.69   542.19   549.97        326.42    333.32
> > 512K      157.29    153.35   152.22   546.52   533.24        315.55    302.27
> > 
> > [1] https://github.com/stefano-garzarella/iperf/
> 
> 
> Hi:
> 
> Do you have any explanation that vsock is better here? Is this because of
> the mergeable buffer? If you, we need test with mrg_rxbuf=off.
> 

Hi Jason,
I tried to disable the mergeable buffer but I had even worst performance
with virtio-net.

Do you think the differences could be related to the TCP/IP stack?

Thanks,
Stefano
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket
  2019-05-15  2:48             ` Jason Wang
  (?)
  (?)
@ 2019-05-28 16:45             ` Stefano Garzarella
  2019-05-29  0:59               ` Jason Wang
  2019-05-29  0:59               ` Jason Wang
  -1 siblings, 2 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-28 16:45 UTC (permalink / raw)
  To: Jason Wang, Stefan Hajnoczi
  Cc: netdev, David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm

On Wed, May 15, 2019 at 10:48:44AM +0800, Jason Wang wrote:
> 
> On 2019/5/15 上午12:35, Stefano Garzarella wrote:
> > On Tue, May 14, 2019 at 11:25:34AM +0800, Jason Wang wrote:
> > > On 2019/5/14 上午1:23, Stefano Garzarella wrote:
> > > > On Mon, May 13, 2019 at 05:58:53PM +0800, Jason Wang wrote:
> > > > > On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> > > > > > +static struct virtio_vsock_buf *
> > > > > > +virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
> > > > > > +{
> > > > > > +	struct virtio_vsock_buf *buf;
> > > > > > +
> > > > > > +	if (pkt->len == 0)
> > > > > > +		return NULL;
> > > > > > +
> > > > > > +	buf = kzalloc(sizeof(*buf), GFP_KERNEL);
> > > > > > +	if (!buf)
> > > > > > +		return NULL;
> > > > > > +
> > > > > > +	/* If the buffer in the virtio_vsock_pkt is full, we can move it to
> > > > > > +	 * the new virtio_vsock_buf avoiding the copy, because we are sure that
> > > > > > +	 * we are not use more memory than that counted by the credit mechanism.
> > > > > > +	 */
> > > > > > +	if (zero_copy && pkt->len == pkt->buf_len) {
> > > > > > +		buf->addr = pkt->buf;
> > > > > > +		pkt->buf = NULL;
> > > > > > +	} else {
> > > > > Is the copy still needed if we're just few bytes less? We meet similar issue
> > > > > for virito-net, and virtio-net solve this by always copy first 128bytes for
> > > > > big packets.
> > > > > 
> > > > > See receive_big()
> > > > I'm seeing, It is more sophisticated.
> > > > IIUC, virtio-net allocates a sk_buff with 128 bytes of buffer, then copies the
> > > > first 128 bytes, then adds the buffer used to receive the packet as a frag to
> > > > the skb.
> > > 
> > > Yes and the point is if the packet is smaller than 128 bytes the pages will
> > > be recycled.
> > > 
> > > 
> > So it's avoid the overhead of allocation of a large buffer. I got it.
> > 
> > Just a curiosity, why the threshold is 128 bytes?
> 
> 
> From its name (GOOD_COPY_LEN), I think it just a value that won't lose much
> performance, e.g the size two cachelines.
> 

Jason, Stefan,
since I'm removing the patches to increase the buffers to 64 KiB and I'm
adding a threshold for small packets, I would simplify this patch,
removing the new buffer allocation and copying small packets into the
buffers already queued (if there is a space).
In this way, I should solve the issue of 1 byte packets.

Do you think could be better?

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket
  2019-05-15  2:48             ` Jason Wang
  (?)
@ 2019-05-28 16:45             ` Stefano Garzarella
  -1 siblings, 0 replies; 75+ messages in thread
From: Stefano Garzarella @ 2019-05-28 16:45 UTC (permalink / raw)
  To: Jason Wang, Stefan Hajnoczi
  Cc: kvm, Michael S. Tsirkin, netdev, linux-kernel, virtualization,
	David S. Miller

On Wed, May 15, 2019 at 10:48:44AM +0800, Jason Wang wrote:
> 
> On 2019/5/15 上午12:35, Stefano Garzarella wrote:
> > On Tue, May 14, 2019 at 11:25:34AM +0800, Jason Wang wrote:
> > > On 2019/5/14 上午1:23, Stefano Garzarella wrote:
> > > > On Mon, May 13, 2019 at 05:58:53PM +0800, Jason Wang wrote:
> > > > > On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> > > > > > +static struct virtio_vsock_buf *
> > > > > > +virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
> > > > > > +{
> > > > > > +	struct virtio_vsock_buf *buf;
> > > > > > +
> > > > > > +	if (pkt->len == 0)
> > > > > > +		return NULL;
> > > > > > +
> > > > > > +	buf = kzalloc(sizeof(*buf), GFP_KERNEL);
> > > > > > +	if (!buf)
> > > > > > +		return NULL;
> > > > > > +
> > > > > > +	/* If the buffer in the virtio_vsock_pkt is full, we can move it to
> > > > > > +	 * the new virtio_vsock_buf avoiding the copy, because we are sure that
> > > > > > +	 * we are not use more memory than that counted by the credit mechanism.
> > > > > > +	 */
> > > > > > +	if (zero_copy && pkt->len == pkt->buf_len) {
> > > > > > +		buf->addr = pkt->buf;
> > > > > > +		pkt->buf = NULL;
> > > > > > +	} else {
> > > > > Is the copy still needed if we're just few bytes less? We meet similar issue
> > > > > for virito-net, and virtio-net solve this by always copy first 128bytes for
> > > > > big packets.
> > > > > 
> > > > > See receive_big()
> > > > I'm seeing, It is more sophisticated.
> > > > IIUC, virtio-net allocates a sk_buff with 128 bytes of buffer, then copies the
> > > > first 128 bytes, then adds the buffer used to receive the packet as a frag to
> > > > the skb.
> > > 
> > > Yes and the point is if the packet is smaller than 128 bytes the pages will
> > > be recycled.
> > > 
> > > 
> > So it's avoid the overhead of allocation of a large buffer. I got it.
> > 
> > Just a curiosity, why the threshold is 128 bytes?
> 
> 
> From its name (GOOD_COPY_LEN), I think it just a value that won't lose much
> performance, e.g the size two cachelines.
> 

Jason, Stefan,
since I'm removing the patches to increase the buffers to 64 KiB and I'm
adding a threshold for small packets, I would simplify this patch,
removing the new buffer allocation and copying small packets into the
buffers already queued (if there is a space).
In this way, I should solve the issue of 1 byte packets.

Do you think could be better?

Thanks,
Stefano
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket
  2019-05-28 16:45             ` Stefano Garzarella
@ 2019-05-29  0:59               ` Jason Wang
  2019-05-29  0:59               ` Jason Wang
  1 sibling, 0 replies; 75+ messages in thread
From: Jason Wang @ 2019-05-29  0:59 UTC (permalink / raw)
  To: Stefano Garzarella, Stefan Hajnoczi
  Cc: netdev, David S. Miller, Michael S. Tsirkin, virtualization,
	linux-kernel, kvm


On 2019/5/29 上午12:45, Stefano Garzarella wrote:
> On Wed, May 15, 2019 at 10:48:44AM +0800, Jason Wang wrote:
>> On 2019/5/15 上午12:35, Stefano Garzarella wrote:
>>> On Tue, May 14, 2019 at 11:25:34AM +0800, Jason Wang wrote:
>>>> On 2019/5/14 上午1:23, Stefano Garzarella wrote:
>>>>> On Mon, May 13, 2019 at 05:58:53PM +0800, Jason Wang wrote:
>>>>>> On 2019/5/10 下午8:58, Stefano Garzarella wrote:
>>>>>>> +static struct virtio_vsock_buf *
>>>>>>> +virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
>>>>>>> +{
>>>>>>> +	struct virtio_vsock_buf *buf;
>>>>>>> +
>>>>>>> +	if (pkt->len == 0)
>>>>>>> +		return NULL;
>>>>>>> +
>>>>>>> +	buf = kzalloc(sizeof(*buf), GFP_KERNEL);
>>>>>>> +	if (!buf)
>>>>>>> +		return NULL;
>>>>>>> +
>>>>>>> +	/* If the buffer in the virtio_vsock_pkt is full, we can move it to
>>>>>>> +	 * the new virtio_vsock_buf avoiding the copy, because we are sure that
>>>>>>> +	 * we are not use more memory than that counted by the credit mechanism.
>>>>>>> +	 */
>>>>>>> +	if (zero_copy && pkt->len == pkt->buf_len) {
>>>>>>> +		buf->addr = pkt->buf;
>>>>>>> +		pkt->buf = NULL;
>>>>>>> +	} else {
>>>>>> Is the copy still needed if we're just few bytes less? We meet similar issue
>>>>>> for virito-net, and virtio-net solve this by always copy first 128bytes for
>>>>>> big packets.
>>>>>>
>>>>>> See receive_big()
>>>>> I'm seeing, It is more sophisticated.
>>>>> IIUC, virtio-net allocates a sk_buff with 128 bytes of buffer, then copies the
>>>>> first 128 bytes, then adds the buffer used to receive the packet as a frag to
>>>>> the skb.
>>>> Yes and the point is if the packet is smaller than 128 bytes the pages will
>>>> be recycled.
>>>>
>>>>
>>> So it's avoid the overhead of allocation of a large buffer. I got it.
>>>
>>> Just a curiosity, why the threshold is 128 bytes?
>>
>>  From its name (GOOD_COPY_LEN), I think it just a value that won't lose much
>> performance, e.g the size two cachelines.
>>
> Jason, Stefan,
> since I'm removing the patches to increase the buffers to 64 KiB and I'm
> adding a threshold for small packets, I would simplify this patch,
> removing the new buffer allocation and copying small packets into the
> buffers already queued (if there is a space).
> In this way, I should solve the issue of 1 byte packets.
>
> Do you think could be better?


I think so.

Thanks


>
> Thanks,
> Stefano

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket
  2019-05-28 16:45             ` Stefano Garzarella
  2019-05-29  0:59               ` Jason Wang
@ 2019-05-29  0:59               ` Jason Wang
  1 sibling, 0 replies; 75+ messages in thread
From: Jason Wang @ 2019-05-29  0:59 UTC (permalink / raw)
  To: Stefano Garzarella, Stefan Hajnoczi
  Cc: kvm, Michael S. Tsirkin, netdev, linux-kernel, virtualization,
	David S. Miller


On 2019/5/29 上午12:45, Stefano Garzarella wrote:
> On Wed, May 15, 2019 at 10:48:44AM +0800, Jason Wang wrote:
>> On 2019/5/15 上午12:35, Stefano Garzarella wrote:
>>> On Tue, May 14, 2019 at 11:25:34AM +0800, Jason Wang wrote:
>>>> On 2019/5/14 上午1:23, Stefano Garzarella wrote:
>>>>> On Mon, May 13, 2019 at 05:58:53PM +0800, Jason Wang wrote:
>>>>>> On 2019/5/10 下午8:58, Stefano Garzarella wrote:
>>>>>>> +static struct virtio_vsock_buf *
>>>>>>> +virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
>>>>>>> +{
>>>>>>> +	struct virtio_vsock_buf *buf;
>>>>>>> +
>>>>>>> +	if (pkt->len == 0)
>>>>>>> +		return NULL;
>>>>>>> +
>>>>>>> +	buf = kzalloc(sizeof(*buf), GFP_KERNEL);
>>>>>>> +	if (!buf)
>>>>>>> +		return NULL;
>>>>>>> +
>>>>>>> +	/* If the buffer in the virtio_vsock_pkt is full, we can move it to
>>>>>>> +	 * the new virtio_vsock_buf avoiding the copy, because we are sure that
>>>>>>> +	 * we are not use more memory than that counted by the credit mechanism.
>>>>>>> +	 */
>>>>>>> +	if (zero_copy && pkt->len == pkt->buf_len) {
>>>>>>> +		buf->addr = pkt->buf;
>>>>>>> +		pkt->buf = NULL;
>>>>>>> +	} else {
>>>>>> Is the copy still needed if we're just few bytes less? We meet similar issue
>>>>>> for virito-net, and virtio-net solve this by always copy first 128bytes for
>>>>>> big packets.
>>>>>>
>>>>>> See receive_big()
>>>>> I'm seeing, It is more sophisticated.
>>>>> IIUC, virtio-net allocates a sk_buff with 128 bytes of buffer, then copies the
>>>>> first 128 bytes, then adds the buffer used to receive the packet as a frag to
>>>>> the skb.
>>>> Yes and the point is if the packet is smaller than 128 bytes the pages will
>>>> be recycled.
>>>>
>>>>
>>> So it's avoid the overhead of allocation of a large buffer. I got it.
>>>
>>> Just a curiosity, why the threshold is 128 bytes?
>>
>>  From its name (GOOD_COPY_LEN), I think it just a value that won't lose much
>> performance, e.g the size two cachelines.
>>
> Jason, Stefan,
> since I'm removing the patches to increase the buffers to 64 KiB and I'm
> adding a threshold for small packets, I would simplify this patch,
> removing the new buffer allocation and copying small packets into the
> buffers already queued (if there is a space).
> In this way, I should solve the issue of 1 byte packets.
>
> Do you think could be better?


I think so.

Thanks


>
> Thanks,
> Stefano
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 75+ messages in thread

end of thread, other threads:[~2019-05-29  0:59 UTC | newest]

Thread overview: 75+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-10 12:58 [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
2019-05-10 12:58 ` [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket Stefano Garzarella
2019-05-10 12:58 ` Stefano Garzarella
2019-05-12 16:57   ` Michael S. Tsirkin
2019-05-12 16:57     ` Michael S. Tsirkin
2019-05-13 16:40     ` Stefano Garzarella
2019-05-13 16:40     ` Stefano Garzarella
2019-05-13  9:58   ` Jason Wang
2019-05-13 17:23     ` Stefano Garzarella
2019-05-14  3:25       ` Jason Wang
2019-05-14  3:25       ` Jason Wang
2019-05-14  3:40         ` Jason Wang
2019-05-14  3:40         ` Jason Wang
2019-05-14 16:35         ` Stefano Garzarella
2019-05-14 16:35         ` Stefano Garzarella
2019-05-15  2:48           ` Jason Wang
2019-05-15  2:48             ` Jason Wang
2019-05-28 16:45             ` Stefano Garzarella
2019-05-28 16:45             ` Stefano Garzarella
2019-05-29  0:59               ` Jason Wang
2019-05-29  0:59               ` Jason Wang
2019-05-13 17:23     ` Stefano Garzarella
2019-05-13  9:58   ` Jason Wang
2019-05-16 15:25   ` Stefan Hajnoczi
2019-05-17  8:25     ` Stefano Garzarella
2019-05-17  8:25     ` Stefano Garzarella
2019-05-20  8:57       ` Stefan Hajnoczi
2019-05-20  8:57       ` Stefan Hajnoczi
2019-05-16 15:25   ` Stefan Hajnoczi
2019-05-10 12:58 ` [PATCH v2 2/8] vsock/virtio: free packets during the socket release Stefano Garzarella
2019-05-10 22:20   ` David Miller
2019-05-10 22:20   ` David Miller
2019-05-11  8:27     ` Stefano Garzarella
2019-05-11  8:27       ` Stefano Garzarella
2019-05-16 15:32   ` Stefan Hajnoczi
2019-05-16 15:32   ` Stefan Hajnoczi
2019-05-17  8:26     ` Stefano Garzarella
2019-05-17  8:26     ` Stefano Garzarella
2019-05-10 12:58 ` Stefano Garzarella
2019-05-10 12:58 ` [PATCH v2 3/8] vsock/virtio: fix locking for fwd_cnt and buf_alloc Stefano Garzarella
2019-05-10 12:58 ` Stefano Garzarella
2019-05-10 12:58 ` [PATCH v2 4/8] vsock/virtio: reduce credit update messages Stefano Garzarella
2019-05-10 12:58 ` Stefano Garzarella
2019-05-10 12:58 ` [PATCH v2 5/8] vhost/vsock: split packets to send using multiple buffers Stefano Garzarella
2019-05-10 12:58 ` Stefano Garzarella
2019-05-10 12:58 ` [PATCH v2 6/8] vsock/virtio: change the maximum packet size allowed Stefano Garzarella
2019-05-10 12:58 ` Stefano Garzarella
2019-05-10 12:58 ` [PATCH v2 7/8] vsock/virtio: increase RX buffer size to 64 KiB Stefano Garzarella
2019-05-13 10:01   ` Jason Wang
2019-05-13 17:51     ` Stefano Garzarella
2019-05-13 17:51     ` Stefano Garzarella
2019-05-14  3:38       ` Jason Wang
2019-05-14 16:20         ` Stefano Garzarella
2019-05-14 16:20         ` Stefano Garzarella
2019-05-15  2:50           ` Jason Wang
2019-05-15  8:22             ` Stefano Garzarella
2019-05-15  8:22             ` Stefano Garzarella
2019-05-15  2:50           ` Jason Wang
2019-05-14  3:38       ` Jason Wang
2019-05-13 10:01   ` Jason Wang
2019-05-10 12:58 ` Stefano Garzarella
2019-05-10 12:58 ` [PATCH v2 8/8] vsock/virtio: make the RX buffer size tunable Stefano Garzarella
2019-05-10 12:58 ` Stefano Garzarella
2019-05-13 10:05   ` Jason Wang
2019-05-13 10:05     ` Jason Wang
2019-05-13 12:46     ` Jason Wang
2019-05-13 12:46     ` Jason Wang
2019-05-14 16:10       ` Stefano Garzarella
2019-05-14 16:10       ` Stefano Garzarella
2019-05-13  9:33 ` [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput Jason Wang
2019-05-13 16:49   ` Stefano Garzarella
2019-05-13 16:49   ` Stefano Garzarella
2019-05-20 14:09   ` Stefano Garzarella
2019-05-20 14:09   ` Stefano Garzarella
2019-05-13  9:33 ` Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.