netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput
@ 2019-07-17 11:30 Stefano Garzarella
  2019-07-17 11:30 ` [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket Stefano Garzarella
                   ` (7 more replies)
  0 siblings, 8 replies; 69+ messages in thread
From: Stefano Garzarella @ 2019-07-17 11:30 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, Stefan Hajnoczi, David S. Miller, virtualization,
	Jason Wang, kvm, Michael S. Tsirkin

This series tries to increase the throughput of virtio-vsock with slight
changes.
While I was testing the v2 of this series I discovered an huge use of memory,
so I added patch 1 to mitigate this issue. I put it in this series in order
to better track the performance trends.

v4:
- rebased all patches on current master (conflicts is Patch 4)
- Patch 1: added Stefan's R-b
- Patch 3: removed lock when buf_alloc is written [David];
           moved this patch after "vsock/virtio: reduce credit update messages"
           to make it clearer
- Patch 4: vhost_exceeds_weight() is recently introduced, so I've solved some
           conflicts

v3: https://patchwork.kernel.org/cover/10970145

v2: https://patchwork.kernel.org/cover/10938743

v1: https://patchwork.kernel.org/cover/10885431

Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
support. As Micheal suggested in the v1, I booted host and guest with 'nosmap'.

A brief description of patches:
- Patches 1:   limit the memory usage with an extra copy for small packets
- Patches 2+3: reduce the number of credit update messages sent to the
               transmitter
- Patches 4+5: allow the host to split packets on multiple buffers and use
               VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed

                    host -> guest [Gbps]
pkt_size before opt   p 1     p 2+3    p 4+5

32         0.032     0.030    0.048    0.051
64         0.061     0.059    0.108    0.117
128        0.122     0.112    0.227    0.234
256        0.244     0.241    0.418    0.415
512        0.459     0.466    0.847    0.865
1K         0.927     0.919    1.657    1.641
2K         1.884     1.813    3.262    3.269
4K         3.378     3.326    6.044    6.195
8K         5.637     5.676   10.141   11.287
16K        8.250     8.402   15.976   16.736
32K       13.327    13.204   19.013   20.515
64K       21.241    21.341   20.973   21.879
128K      21.851    22.354   21.816   23.203
256K      21.408    21.693   21.846   24.088
512K      21.600    21.899   21.921   24.106

                    guest -> host [Gbps]
pkt_size before opt   p 1     p 2+3    p 4+5

32         0.045     0.046    0.057    0.057
64         0.089     0.091    0.103    0.104
128        0.170     0.179    0.192    0.200
256        0.364     0.351    0.361    0.379
512        0.709     0.699    0.731    0.790
1K         1.399     1.407    1.395    1.427
2K         2.670     2.684    2.745    2.835
4K         5.171     5.199    5.305    5.451
8K         8.442     8.500   10.083    9.941
16K       12.305    12.259   13.519   15.385
32K       11.418    11.150   11.988   24.680
64K       10.778    10.659   11.589   35.273
128K      10.421    10.339   10.939   40.338
256K      10.300     9.719   10.508   36.562
512K       9.833     9.808   10.612   35.979

As Stefan suggested in the v1, I measured also the efficiency in this way:
    efficiency = Mbps / (%CPU_Host + %CPU_Guest)

The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
but it's provided for free from iperf3 and could be an indication.

        host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
pkt_size before opt   p 1     p 2+3    p 4+5

32         0.35      0.45     0.79     1.02
64         0.56      0.80     1.41     1.54
128        1.11      1.52     3.03     3.12
256        2.20      2.16     5.44     5.58
512        4.17      4.18    10.96    11.46
1K         8.30      8.26    20.99    20.89
2K        16.82     16.31    39.76    39.73
4K        30.89     30.79    74.07    75.73
8K        53.74     54.49   124.24   148.91
16K       80.68     83.63   200.21   232.79
32K      132.27    132.52   260.81   357.07
64K      229.82    230.40   300.19   444.18
128K     332.60    329.78   331.51   492.28
256K     331.06    337.22   339.59   511.59
512K     335.58    328.50   331.56   504.56

        guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
pkt_size before opt   p 1     p 2+3    p 4+5

32         0.43      0.43     0.53     0.56
64         0.85      0.86     1.04     1.10
128        1.63      1.71     2.07     2.13
256        3.48      3.35     4.02     4.22
512        6.80      6.67     7.97     8.63
1K        13.32     13.31    15.72    15.94
2K        25.79     25.92    30.84    30.98
4K        50.37     50.48    58.79    59.69
8K        95.90     96.15   107.04   110.33
16K      145.80    145.43   143.97   174.70
32K      147.06    144.74   146.02   282.48
64K      145.25    143.99   141.62   406.40
128K     149.34    146.96   147.49   489.34
256K     156.35    149.81   152.21   536.37
512K     151.65    150.74   151.52   519.93

[1] https://github.com/stefano-garzarella/iperf/

Stefano Garzarella (5):
  vsock/virtio: limit the memory used per-socket
  vsock/virtio: reduce credit update messages
  vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
  vhost/vsock: split packets to send using multiple buffers
  vsock/virtio: change the maximum packet size allowed

 drivers/vhost/vsock.c                   | 68 ++++++++++++-----
 include/linux/virtio_vsock.h            |  4 +-
 net/vmw_vsock/virtio_transport.c        |  1 +
 net/vmw_vsock/virtio_transport_common.c | 99 ++++++++++++++++++++-----
 4 files changed, 134 insertions(+), 38 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-07-17 11:30 [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
@ 2019-07-17 11:30 ` Stefano Garzarella
  2019-07-29 14:04   ` Michael S. Tsirkin
  2019-07-17 11:30 ` [PATCH v4 2/5] vsock/virtio: reduce credit update messages Stefano Garzarella
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 69+ messages in thread
From: Stefano Garzarella @ 2019-07-17 11:30 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, Stefan Hajnoczi, David S. Miller, virtualization,
	Jason Wang, kvm, Michael S. Tsirkin

Since virtio-vsock was introduced, the buffers filled by the host
and pushed to the guest using the vring, are directly queued in
a per-socket list. These buffers are preallocated by the guest
with a fixed size (4 KB).

The maximum amount of memory used by each socket should be
controlled by the credit mechanism.
The default credit available per-socket is 256 KB, but if we use
only 1 byte per packet, the guest can queue up to 262144 of 4 KB
buffers, using up to 1 GB of memory per-socket. In addition, the
guest will continue to fill the vring with new 4 KB free buffers
to avoid starvation of other sockets.

This patch mitigates this issue copying the payload of small
packets (< 128 bytes) into the buffer of last packet queued, in
order to avoid wasting memory.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 drivers/vhost/vsock.c                   |  2 +
 include/linux/virtio_vsock.h            |  1 +
 net/vmw_vsock/virtio_transport.c        |  1 +
 net/vmw_vsock/virtio_transport_common.c | 60 +++++++++++++++++++++----
 4 files changed, 55 insertions(+), 9 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 6a50e1d0529c..6c8390a2af52 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -329,6 +329,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
 		return NULL;
 	}
 
+	pkt->buf_len = pkt->len;
+
 	nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter);
 	if (nbytes != pkt->len) {
 		vq_err(vq, "Expected %u byte payload, got %zu bytes\n",
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index e223e2632edd..7d973903f52e 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -52,6 +52,7 @@ struct virtio_vsock_pkt {
 	/* socket refcnt not held, only use for cancellation */
 	struct vsock_sock *vsk;
 	void *buf;
+	u32 buf_len;
 	u32 len;
 	u32 off;
 	bool reply;
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 0815d1357861..082a30936690 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -307,6 +307,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
 			break;
 		}
 
+		pkt->buf_len = buf_len;
 		pkt->len = buf_len;
 
 		sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 6f1a8aff65c5..095221f94786 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -26,6 +26,9 @@
 /* How long to wait for graceful shutdown of a connection */
 #define VSOCK_CLOSE_TIMEOUT (8 * HZ)
 
+/* Threshold for detecting small packets to copy */
+#define GOOD_COPY_LEN  128
+
 static const struct virtio_transport *virtio_transport_get_ops(void)
 {
 	const struct vsock_transport *t = vsock_core_get_transport();
@@ -64,6 +67,9 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
 		pkt->buf = kmalloc(len, GFP_KERNEL);
 		if (!pkt->buf)
 			goto out_pkt;
+
+		pkt->buf_len = len;
+
 		err = memcpy_from_msg(pkt->buf, info->msg, len);
 		if (err)
 			goto out;
@@ -841,24 +847,60 @@ virtio_transport_recv_connecting(struct sock *sk,
 	return err;
 }
 
+static void
+virtio_transport_recv_enqueue(struct vsock_sock *vsk,
+			      struct virtio_vsock_pkt *pkt)
+{
+	struct virtio_vsock_sock *vvs = vsk->trans;
+	bool free_pkt = false;
+
+	pkt->len = le32_to_cpu(pkt->hdr.len);
+	pkt->off = 0;
+
+	spin_lock_bh(&vvs->rx_lock);
+
+	virtio_transport_inc_rx_pkt(vvs, pkt);
+
+	/* Try to copy small packets into the buffer of last packet queued,
+	 * to avoid wasting memory queueing the entire buffer with a small
+	 * payload.
+	 */
+	if (pkt->len <= GOOD_COPY_LEN && !list_empty(&vvs->rx_queue)) {
+		struct virtio_vsock_pkt *last_pkt;
+
+		last_pkt = list_last_entry(&vvs->rx_queue,
+					   struct virtio_vsock_pkt, list);
+
+		/* If there is space in the last packet queued, we copy the
+		 * new packet in its buffer.
+		 */
+		if (pkt->len <= last_pkt->buf_len - last_pkt->len) {
+			memcpy(last_pkt->buf + last_pkt->len, pkt->buf,
+			       pkt->len);
+			last_pkt->len += pkt->len;
+			free_pkt = true;
+			goto out;
+		}
+	}
+
+	list_add_tail(&pkt->list, &vvs->rx_queue);
+
+out:
+	spin_unlock_bh(&vvs->rx_lock);
+	if (free_pkt)
+		virtio_transport_free_pkt(pkt);
+}
+
 static int
 virtio_transport_recv_connected(struct sock *sk,
 				struct virtio_vsock_pkt *pkt)
 {
 	struct vsock_sock *vsk = vsock_sk(sk);
-	struct virtio_vsock_sock *vvs = vsk->trans;
 	int err = 0;
 
 	switch (le16_to_cpu(pkt->hdr.op)) {
 	case VIRTIO_VSOCK_OP_RW:
-		pkt->len = le32_to_cpu(pkt->hdr.len);
-		pkt->off = 0;
-
-		spin_lock_bh(&vvs->rx_lock);
-		virtio_transport_inc_rx_pkt(vvs, pkt);
-		list_add_tail(&pkt->list, &vvs->rx_queue);
-		spin_unlock_bh(&vvs->rx_lock);
-
+		virtio_transport_recv_enqueue(vsk, pkt);
 		sk->sk_data_ready(sk);
 		return err;
 	case VIRTIO_VSOCK_OP_CREDIT_UPDATE:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 2/5] vsock/virtio: reduce credit update messages
  2019-07-17 11:30 [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
  2019-07-17 11:30 ` [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket Stefano Garzarella
@ 2019-07-17 11:30 ` Stefano Garzarella
  2019-07-22  8:36   ` Stefan Hajnoczi
  2019-09-03  4:38   ` Michael S. Tsirkin
  2019-07-17 11:30 ` [PATCH v4 3/5] vsock/virtio: fix locking in virtio_transport_inc_tx_pkt() Stefano Garzarella
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 69+ messages in thread
From: Stefano Garzarella @ 2019-07-17 11:30 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, Stefan Hajnoczi, David S. Miller, virtualization,
	Jason Wang, kvm, Michael S. Tsirkin

In order to reduce the number of credit update messages,
we send them only when the space available seen by the
transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 include/linux/virtio_vsock.h            |  1 +
 net/vmw_vsock/virtio_transport_common.c | 16 +++++++++++++---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 7d973903f52e..49fc9d20bc43 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -41,6 +41,7 @@ struct virtio_vsock_sock {
 
 	/* Protected by rx_lock */
 	u32 fwd_cnt;
+	u32 last_fwd_cnt;
 	u32 rx_bytes;
 	struct list_head rx_queue;
 };
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 095221f94786..a85559d4d974 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -211,6 +211,7 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
 void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
 {
 	spin_lock_bh(&vvs->tx_lock);
+	vvs->last_fwd_cnt = vvs->fwd_cnt;
 	pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
 	pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
 	spin_unlock_bh(&vvs->tx_lock);
@@ -261,6 +262,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 	struct virtio_vsock_sock *vvs = vsk->trans;
 	struct virtio_vsock_pkt *pkt;
 	size_t bytes, total = 0;
+	u32 free_space;
 	int err = -EFAULT;
 
 	spin_lock_bh(&vvs->rx_lock);
@@ -291,11 +293,19 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 			virtio_transport_free_pkt(pkt);
 		}
 	}
+
+	free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
+
 	spin_unlock_bh(&vvs->rx_lock);
 
-	/* Send a credit pkt to peer */
-	virtio_transport_send_credit_update(vsk, VIRTIO_VSOCK_TYPE_STREAM,
-					    NULL);
+	/* We send a credit update only when the space available seen
+	 * by the transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE
+	 */
+	if (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
+		virtio_transport_send_credit_update(vsk,
+						    VIRTIO_VSOCK_TYPE_STREAM,
+						    NULL);
+	}
 
 	return total;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 3/5] vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
  2019-07-17 11:30 [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
  2019-07-17 11:30 ` [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket Stefano Garzarella
  2019-07-17 11:30 ` [PATCH v4 2/5] vsock/virtio: reduce credit update messages Stefano Garzarella
@ 2019-07-17 11:30 ` Stefano Garzarella
  2019-07-17 14:51   ` Michael S. Tsirkin
  2019-07-22  8:53   ` Stefan Hajnoczi
  2019-07-17 11:30 ` [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers Stefano Garzarella
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 69+ messages in thread
From: Stefano Garzarella @ 2019-07-17 11:30 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, Stefan Hajnoczi, David S. Miller, virtualization,
	Jason Wang, kvm, Michael S. Tsirkin

fwd_cnt and last_fwd_cnt are protected by rx_lock, so we should use
the same spinlock also if we are in the TX path.

Move also buf_alloc under the same lock.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 include/linux/virtio_vsock.h            | 2 +-
 net/vmw_vsock/virtio_transport_common.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 49fc9d20bc43..4c7781f4b29b 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -35,7 +35,6 @@ struct virtio_vsock_sock {
 
 	/* Protected by tx_lock */
 	u32 tx_cnt;
-	u32 buf_alloc;
 	u32 peer_fwd_cnt;
 	u32 peer_buf_alloc;
 
@@ -43,6 +42,7 @@ struct virtio_vsock_sock {
 	u32 fwd_cnt;
 	u32 last_fwd_cnt;
 	u32 rx_bytes;
+	u32 buf_alloc;
 	struct list_head rx_queue;
 };
 
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index a85559d4d974..34a2b42313b7 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -210,11 +210,11 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
 
 void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
 {
-	spin_lock_bh(&vvs->tx_lock);
+	spin_lock_bh(&vvs->rx_lock);
 	vvs->last_fwd_cnt = vvs->fwd_cnt;
 	pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
 	pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
-	spin_unlock_bh(&vvs->tx_lock);
+	spin_unlock_bh(&vvs->rx_lock);
 }
 EXPORT_SYMBOL_GPL(virtio_transport_inc_tx_pkt);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
  2019-07-17 11:30 [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
                   ` (2 preceding siblings ...)
  2019-07-17 11:30 ` [PATCH v4 3/5] vsock/virtio: fix locking in virtio_transport_inc_tx_pkt() Stefano Garzarella
@ 2019-07-17 11:30 ` Stefano Garzarella
  2019-07-17 14:54   ` Michael S. Tsirkin
  2019-07-22  9:06   ` Stefan Hajnoczi
  2019-07-17 11:30 ` [PATCH v4 5/5] vsock/virtio: change the maximum packet size allowed Stefano Garzarella
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 69+ messages in thread
From: Stefano Garzarella @ 2019-07-17 11:30 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, Stefan Hajnoczi, David S. Miller, virtualization,
	Jason Wang, kvm, Michael S. Tsirkin

If the packets to sent to the guest are bigger than the buffer
available, we can split them, using multiple buffers and fixing
the length in the packet header.
This is safe since virtio-vsock supports only stream sockets.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 drivers/vhost/vsock.c                   | 66 ++++++++++++++++++-------
 net/vmw_vsock/virtio_transport_common.c | 15 ++++--
 2 files changed, 60 insertions(+), 21 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 6c8390a2af52..9f57736fe15e 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -102,7 +102,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 		struct iov_iter iov_iter;
 		unsigned out, in;
 		size_t nbytes;
-		size_t len;
+		size_t iov_len, payload_len;
 		int head;
 
 		spin_lock_bh(&vsock->send_pkt_list_lock);
@@ -147,8 +147,24 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 			break;
 		}
 
-		len = iov_length(&vq->iov[out], in);
-		iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
+		iov_len = iov_length(&vq->iov[out], in);
+		if (iov_len < sizeof(pkt->hdr)) {
+			virtio_transport_free_pkt(pkt);
+			vq_err(vq, "Buffer len [%zu] too small\n", iov_len);
+			break;
+		}
+
+		iov_iter_init(&iov_iter, READ, &vq->iov[out], in, iov_len);
+		payload_len = pkt->len - pkt->off;
+
+		/* If the packet is greater than the space available in the
+		 * buffer, we split it using multiple buffers.
+		 */
+		if (payload_len > iov_len - sizeof(pkt->hdr))
+			payload_len = iov_len - sizeof(pkt->hdr);
+
+		/* Set the correct length in the header */
+		pkt->hdr.len = cpu_to_le32(payload_len);
 
 		nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
 		if (nbytes != sizeof(pkt->hdr)) {
@@ -157,33 +173,47 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 			break;
 		}
 
-		nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);
-		if (nbytes != pkt->len) {
+		nbytes = copy_to_iter(pkt->buf + pkt->off, payload_len,
+				      &iov_iter);
+		if (nbytes != payload_len) {
 			virtio_transport_free_pkt(pkt);
 			vq_err(vq, "Faulted on copying pkt buf\n");
 			break;
 		}
 
-		vhost_add_used(vq, head, sizeof(pkt->hdr) + pkt->len);
+		vhost_add_used(vq, head, sizeof(pkt->hdr) + payload_len);
 		added = true;
 
-		if (pkt->reply) {
-			int val;
-
-			val = atomic_dec_return(&vsock->queued_replies);
-
-			/* Do we have resources to resume tx processing? */
-			if (val + 1 == tx_vq->num)
-				restart_tx = true;
-		}
-
 		/* Deliver to monitoring devices all correctly transmitted
 		 * packets.
 		 */
 		virtio_transport_deliver_tap_pkt(pkt);
 
-		total_len += pkt->len;
-		virtio_transport_free_pkt(pkt);
+		pkt->off += payload_len;
+		total_len += payload_len;
+
+		/* If we didn't send all the payload we can requeue the packet
+		 * to send it with the next available buffer.
+		 */
+		if (pkt->off < pkt->len) {
+			spin_lock_bh(&vsock->send_pkt_list_lock);
+			list_add(&pkt->list, &vsock->send_pkt_list);
+			spin_unlock_bh(&vsock->send_pkt_list_lock);
+		} else {
+			if (pkt->reply) {
+				int val;
+
+				val = atomic_dec_return(&vsock->queued_replies);
+
+				/* Do we have resources to resume tx
+				 * processing?
+				 */
+				if (val + 1 == tx_vq->num)
+					restart_tx = true;
+			}
+
+			virtio_transport_free_pkt(pkt);
+		}
 	} while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
 	if (added)
 		vhost_signal(&vsock->dev, vq);
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 34a2b42313b7..56fab3f03d0e 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -97,8 +97,17 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque)
 	struct virtio_vsock_pkt *pkt = opaque;
 	struct af_vsockmon_hdr *hdr;
 	struct sk_buff *skb;
+	size_t payload_len;
+	void *payload_buf;
 
-	skb = alloc_skb(sizeof(*hdr) + sizeof(pkt->hdr) + pkt->len,
+	/* A packet could be split to fit the RX buffer, so we can retrieve
+	 * the payload length from the header and the buffer pointer taking
+	 * care of the offset in the original packet.
+	 */
+	payload_len = le32_to_cpu(pkt->hdr.len);
+	payload_buf = pkt->buf + pkt->off;
+
+	skb = alloc_skb(sizeof(*hdr) + sizeof(pkt->hdr) + payload_len,
 			GFP_ATOMIC);
 	if (!skb)
 		return NULL;
@@ -138,8 +147,8 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque)
 
 	skb_put_data(skb, &pkt->hdr, sizeof(pkt->hdr));
 
-	if (pkt->len) {
-		skb_put_data(skb, pkt->buf, pkt->len);
+	if (payload_len) {
+		skb_put_data(skb, payload_buf, payload_len);
 	}
 
 	return skb;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 5/5] vsock/virtio: change the maximum packet size allowed
  2019-07-17 11:30 [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
                   ` (3 preceding siblings ...)
  2019-07-17 11:30 ` [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers Stefano Garzarella
@ 2019-07-17 11:30 ` Stefano Garzarella
  2019-07-17 14:59   ` Michael S. Tsirkin
  2019-07-22  9:07   ` Stefan Hajnoczi
  2019-07-22  9:08 ` [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput Stefan Hajnoczi
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 69+ messages in thread
From: Stefano Garzarella @ 2019-07-17 11:30 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, Stefan Hajnoczi, David S. Miller, virtualization,
	Jason Wang, kvm, Michael S. Tsirkin

Since now we are able to split packets, we can avoid limiting
their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
packet size.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 net/vmw_vsock/virtio_transport_common.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 56fab3f03d0e..94cc0fa3e848 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -181,8 +181,8 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 	vvs = vsk->trans;
 
 	/* we can send less than pkt_len bytes */
-	if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
-		pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
+	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
+		pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
 
 	/* virtio_transport_get_credit might return less than pkt_len credit */
 	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 3/5] vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
  2019-07-17 11:30 ` [PATCH v4 3/5] vsock/virtio: fix locking in virtio_transport_inc_tx_pkt() Stefano Garzarella
@ 2019-07-17 14:51   ` Michael S. Tsirkin
  2019-07-18  7:43     ` Stefano Garzarella
  2019-07-22  8:53   ` Stefan Hajnoczi
  1 sibling, 1 reply; 69+ messages in thread
From: Michael S. Tsirkin @ 2019-07-17 14:51 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Wed, Jul 17, 2019 at 01:30:28PM +0200, Stefano Garzarella wrote:
> fwd_cnt and last_fwd_cnt are protected by rx_lock, so we should use
> the same spinlock also if we are in the TX path.
> 
> Move also buf_alloc under the same lock.
> 
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>

Wait a second is this a bugfix?
If it's used under the wrong lock won't values get corrupted?
Won't traffic then stall or more data get to sent than
credits?

> ---
>  include/linux/virtio_vsock.h            | 2 +-
>  net/vmw_vsock/virtio_transport_common.c | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index 49fc9d20bc43..4c7781f4b29b 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -35,7 +35,6 @@ struct virtio_vsock_sock {
>  
>  	/* Protected by tx_lock */
>  	u32 tx_cnt;
> -	u32 buf_alloc;
>  	u32 peer_fwd_cnt;
>  	u32 peer_buf_alloc;
>  
> @@ -43,6 +42,7 @@ struct virtio_vsock_sock {
>  	u32 fwd_cnt;
>  	u32 last_fwd_cnt;
>  	u32 rx_bytes;
> +	u32 buf_alloc;
>  	struct list_head rx_queue;
>  };
>  
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index a85559d4d974..34a2b42313b7 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -210,11 +210,11 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
>  
>  void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
>  {
> -	spin_lock_bh(&vvs->tx_lock);
> +	spin_lock_bh(&vvs->rx_lock);
>  	vvs->last_fwd_cnt = vvs->fwd_cnt;
>  	pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
>  	pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
> -	spin_unlock_bh(&vvs->tx_lock);
> +	spin_unlock_bh(&vvs->rx_lock);
>  }
>  EXPORT_SYMBOL_GPL(virtio_transport_inc_tx_pkt);
>  
> -- 
> 2.20.1

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
  2019-07-17 11:30 ` [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers Stefano Garzarella
@ 2019-07-17 14:54   ` Michael S. Tsirkin
  2019-07-18  7:50     ` Stefano Garzarella
  2019-07-22  9:06   ` Stefan Hajnoczi
  1 sibling, 1 reply; 69+ messages in thread
From: Michael S. Tsirkin @ 2019-07-17 14:54 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> If the packets to sent to the guest are bigger than the buffer
> available, we can split them, using multiple buffers and fixing
> the length in the packet header.
> This is safe since virtio-vsock supports only stream sockets.
> 
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>

So how does it work right now? If an app
does sendmsg with a 64K buffer and the other
side publishes 4K buffers - does it just stall?


> ---
>  drivers/vhost/vsock.c                   | 66 ++++++++++++++++++-------
>  net/vmw_vsock/virtio_transport_common.c | 15 ++++--
>  2 files changed, 60 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index 6c8390a2af52..9f57736fe15e 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -102,7 +102,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
>  		struct iov_iter iov_iter;
>  		unsigned out, in;
>  		size_t nbytes;
> -		size_t len;
> +		size_t iov_len, payload_len;
>  		int head;
>  
>  		spin_lock_bh(&vsock->send_pkt_list_lock);
> @@ -147,8 +147,24 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
>  			break;
>  		}
>  
> -		len = iov_length(&vq->iov[out], in);
> -		iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
> +		iov_len = iov_length(&vq->iov[out], in);
> +		if (iov_len < sizeof(pkt->hdr)) {
> +			virtio_transport_free_pkt(pkt);
> +			vq_err(vq, "Buffer len [%zu] too small\n", iov_len);
> +			break;
> +		}
> +
> +		iov_iter_init(&iov_iter, READ, &vq->iov[out], in, iov_len);
> +		payload_len = pkt->len - pkt->off;
> +
> +		/* If the packet is greater than the space available in the
> +		 * buffer, we split it using multiple buffers.
> +		 */
> +		if (payload_len > iov_len - sizeof(pkt->hdr))
> +			payload_len = iov_len - sizeof(pkt->hdr);
> +
> +		/* Set the correct length in the header */
> +		pkt->hdr.len = cpu_to_le32(payload_len);
>  
>  		nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
>  		if (nbytes != sizeof(pkt->hdr)) {
> @@ -157,33 +173,47 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
>  			break;
>  		}
>  
> -		nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);
> -		if (nbytes != pkt->len) {
> +		nbytes = copy_to_iter(pkt->buf + pkt->off, payload_len,
> +				      &iov_iter);
> +		if (nbytes != payload_len) {
>  			virtio_transport_free_pkt(pkt);
>  			vq_err(vq, "Faulted on copying pkt buf\n");
>  			break;
>  		}
>  
> -		vhost_add_used(vq, head, sizeof(pkt->hdr) + pkt->len);
> +		vhost_add_used(vq, head, sizeof(pkt->hdr) + payload_len);
>  		added = true;
>  
> -		if (pkt->reply) {
> -			int val;
> -
> -			val = atomic_dec_return(&vsock->queued_replies);
> -
> -			/* Do we have resources to resume tx processing? */
> -			if (val + 1 == tx_vq->num)
> -				restart_tx = true;
> -		}
> -
>  		/* Deliver to monitoring devices all correctly transmitted
>  		 * packets.
>  		 */
>  		virtio_transport_deliver_tap_pkt(pkt);
>  
> -		total_len += pkt->len;
> -		virtio_transport_free_pkt(pkt);
> +		pkt->off += payload_len;
> +		total_len += payload_len;
> +
> +		/* If we didn't send all the payload we can requeue the packet
> +		 * to send it with the next available buffer.
> +		 */
> +		if (pkt->off < pkt->len) {
> +			spin_lock_bh(&vsock->send_pkt_list_lock);
> +			list_add(&pkt->list, &vsock->send_pkt_list);
> +			spin_unlock_bh(&vsock->send_pkt_list_lock);
> +		} else {
> +			if (pkt->reply) {
> +				int val;
> +
> +				val = atomic_dec_return(&vsock->queued_replies);
> +
> +				/* Do we have resources to resume tx
> +				 * processing?
> +				 */
> +				if (val + 1 == tx_vq->num)
> +					restart_tx = true;
> +			}
> +
> +			virtio_transport_free_pkt(pkt);
> +		}
>  	} while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
>  	if (added)
>  		vhost_signal(&vsock->dev, vq);
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 34a2b42313b7..56fab3f03d0e 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -97,8 +97,17 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque)
>  	struct virtio_vsock_pkt *pkt = opaque;
>  	struct af_vsockmon_hdr *hdr;
>  	struct sk_buff *skb;
> +	size_t payload_len;
> +	void *payload_buf;
>  
> -	skb = alloc_skb(sizeof(*hdr) + sizeof(pkt->hdr) + pkt->len,
> +	/* A packet could be split to fit the RX buffer, so we can retrieve
> +	 * the payload length from the header and the buffer pointer taking
> +	 * care of the offset in the original packet.
> +	 */
> +	payload_len = le32_to_cpu(pkt->hdr.len);
> +	payload_buf = pkt->buf + pkt->off;
> +
> +	skb = alloc_skb(sizeof(*hdr) + sizeof(pkt->hdr) + payload_len,
>  			GFP_ATOMIC);
>  	if (!skb)
>  		return NULL;
> @@ -138,8 +147,8 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque)
>  
>  	skb_put_data(skb, &pkt->hdr, sizeof(pkt->hdr));
>  
> -	if (pkt->len) {
> -		skb_put_data(skb, pkt->buf, pkt->len);
> +	if (payload_len) {
> +		skb_put_data(skb, payload_buf, payload_len);
>  	}
>  
>  	return skb;
> -- 
> 2.20.1

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 5/5] vsock/virtio: change the maximum packet size allowed
  2019-07-17 11:30 ` [PATCH v4 5/5] vsock/virtio: change the maximum packet size allowed Stefano Garzarella
@ 2019-07-17 14:59   ` Michael S. Tsirkin
  2019-07-18  7:52     ` Stefano Garzarella
  2019-07-22  9:07   ` Stefan Hajnoczi
  1 sibling, 1 reply; 69+ messages in thread
From: Michael S. Tsirkin @ 2019-07-17 14:59 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Wed, Jul 17, 2019 at 01:30:30PM +0200, Stefano Garzarella wrote:
> Since now we are able to split packets, we can avoid limiting
> their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
> Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
> packet size.
> 
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>


OK so this is kind of like GSO where we are passing
64K packets to the vsock and then split at the
low level.


> ---
>  net/vmw_vsock/virtio_transport_common.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 56fab3f03d0e..94cc0fa3e848 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -181,8 +181,8 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>  	vvs = vsk->trans;
>  
>  	/* we can send less than pkt_len bytes */
> -	if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
> -		pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> +	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
> +		pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
>  
>  	/* virtio_transport_get_credit might return less than pkt_len credit */
>  	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
> -- 
> 2.20.1

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 3/5] vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
  2019-07-17 14:51   ` Michael S. Tsirkin
@ 2019-07-18  7:43     ` Stefano Garzarella
  0 siblings, 0 replies; 69+ messages in thread
From: Stefano Garzarella @ 2019-07-18  7:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Wed, Jul 17, 2019 at 4:51 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jul 17, 2019 at 01:30:28PM +0200, Stefano Garzarella wrote:
> > fwd_cnt and last_fwd_cnt are protected by rx_lock, so we should use
> > the same spinlock also if we are in the TX path.
> >
> > Move also buf_alloc under the same lock.
> >
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>
> Wait a second is this a bugfix?
> If it's used under the wrong lock won't values get corrupted?
> Won't traffic then stall or more data get to sent than
> credits?

Before this series, we only read vvs->fwd_cnt and vvs->buf_alloc in this
function, but using a different lock than the one used to write them.
I'm not sure if a corruption can happen, but if we want to avoid the
lock, we should use an atomic operation or memory barriers.

Since now we also need to update vvs->last_fwd_cnt, in order to limit the
credit message, I decided to take the same lock used to protect vvs->fwd_cnt
and vvs->last_fwd_cnt.


Thanks,
Stefano

>
> > ---
> >  include/linux/virtio_vsock.h            | 2 +-
> >  net/vmw_vsock/virtio_transport_common.c | 4 ++--
> >  2 files changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index 49fc9d20bc43..4c7781f4b29b 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -35,7 +35,6 @@ struct virtio_vsock_sock {
> >
> >       /* Protected by tx_lock */
> >       u32 tx_cnt;
> > -     u32 buf_alloc;
> >       u32 peer_fwd_cnt;
> >       u32 peer_buf_alloc;
> >
> > @@ -43,6 +42,7 @@ struct virtio_vsock_sock {
> >       u32 fwd_cnt;
> >       u32 last_fwd_cnt;
> >       u32 rx_bytes;
> > +     u32 buf_alloc;
> >       struct list_head rx_queue;
> >  };
> >
> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > index a85559d4d974..34a2b42313b7 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -210,11 +210,11 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
> >
> >  void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
> >  {
> > -     spin_lock_bh(&vvs->tx_lock);
> > +     spin_lock_bh(&vvs->rx_lock);
> >       vvs->last_fwd_cnt = vvs->fwd_cnt;
> >       pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
> >       pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
> > -     spin_unlock_bh(&vvs->tx_lock);
> > +     spin_unlock_bh(&vvs->rx_lock);
> >  }
> >  EXPORT_SYMBOL_GPL(virtio_transport_inc_tx_pkt);
> >
> > --
> > 2.20.1

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
  2019-07-17 14:54   ` Michael S. Tsirkin
@ 2019-07-18  7:50     ` Stefano Garzarella
  2019-07-18  8:13       ` Michael S. Tsirkin
  0 siblings, 1 reply; 69+ messages in thread
From: Stefano Garzarella @ 2019-07-18  7:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> > If the packets to sent to the guest are bigger than the buffer
> > available, we can split them, using multiple buffers and fixing
> > the length in the packet header.
> > This is safe since virtio-vsock supports only stream sockets.
> >
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>
> So how does it work right now? If an app
> does sendmsg with a 64K buffer and the other
> side publishes 4K buffers - does it just stall?

Before this series, the 64K (or bigger) user messages was split in 4K packets
(fixed in the code) and queued in an internal list for the TX worker.

After this series, we will queue up to 64K packets and then it will be split in
the TX worker, depending on the size of the buffers available in the
vring. (The idea was to allow EWMA or a configuration of the buffers size, but
for now we postponed it)

Note: virtio-vsock only supports stream socket for now.

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 5/5] vsock/virtio: change the maximum packet size allowed
  2019-07-17 14:59   ` Michael S. Tsirkin
@ 2019-07-18  7:52     ` Stefano Garzarella
  2019-07-18 12:33       ` Michael S. Tsirkin
  0 siblings, 1 reply; 69+ messages in thread
From: Stefano Garzarella @ 2019-07-18  7:52 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Wed, Jul 17, 2019 at 5:00 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jul 17, 2019 at 01:30:30PM +0200, Stefano Garzarella wrote:
> > Since now we are able to split packets, we can avoid limiting
> > their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
> > Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
> > packet size.
> >
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>
>
> OK so this is kind of like GSO where we are passing
> 64K packets to the vsock and then split at the
> low level.

Exactly, something like that in the Host->Guest path, instead in the
Guest->Host we use the entire 64K packet.

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
  2019-07-18  7:50     ` Stefano Garzarella
@ 2019-07-18  8:13       ` Michael S. Tsirkin
  2019-07-18  9:37         ` Stefano Garzarella
  0 siblings, 1 reply; 69+ messages in thread
From: Michael S. Tsirkin @ 2019-07-18  8:13 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
> On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> > > If the packets to sent to the guest are bigger than the buffer
> > > available, we can split them, using multiple buffers and fixing
> > > the length in the packet header.
> > > This is safe since virtio-vsock supports only stream sockets.
> > >
> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> >
> > So how does it work right now? If an app
> > does sendmsg with a 64K buffer and the other
> > side publishes 4K buffers - does it just stall?
> 
> Before this series, the 64K (or bigger) user messages was split in 4K packets
> (fixed in the code) and queued in an internal list for the TX worker.
> 
> After this series, we will queue up to 64K packets and then it will be split in
> the TX worker, depending on the size of the buffers available in the
> vring. (The idea was to allow EWMA or a configuration of the buffers size, but
> for now we postponed it)

Got it. Using workers for xmit is IMHO a bad idea btw.
Why is it done like this?

> Note: virtio-vsock only supports stream socket for now.
> 
> Thanks,
> Stefano

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
  2019-07-18  8:13       ` Michael S. Tsirkin
@ 2019-07-18  9:37         ` Stefano Garzarella
  2019-07-18 11:35           ` Michael S. Tsirkin
  0 siblings, 1 reply; 69+ messages in thread
From: Stefano Garzarella @ 2019-07-18  9:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Thu, Jul 18, 2019 at 10:13 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
> > On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> > > > If the packets to sent to the guest are bigger than the buffer
> > > > available, we can split them, using multiple buffers and fixing
> > > > the length in the packet header.
> > > > This is safe since virtio-vsock supports only stream sockets.
> > > >
> > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > >
> > > So how does it work right now? If an app
> > > does sendmsg with a 64K buffer and the other
> > > side publishes 4K buffers - does it just stall?
> >
> > Before this series, the 64K (or bigger) user messages was split in 4K packets
> > (fixed in the code) and queued in an internal list for the TX worker.
> >
> > After this series, we will queue up to 64K packets and then it will be split in
> > the TX worker, depending on the size of the buffers available in the
> > vring. (The idea was to allow EWMA or a configuration of the buffers size, but
> > for now we postponed it)
>
> Got it. Using workers for xmit is IMHO a bad idea btw.
> Why is it done like this?

Honestly, I don't know the exact reasons for this design, but I suppose
that the idea was to have only one worker that uses the vring, and
multiple user threads that enqueue packets in the list.
This can simplify the code and we can put the user threads to sleep if
we don't have "credit" available (this means that the receiver doesn't
have space to receive the packet).

What are the drawbacks in your opinion?


Thanks,
Stefano

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
  2019-07-18  9:37         ` Stefano Garzarella
@ 2019-07-18 11:35           ` Michael S. Tsirkin
  2019-07-19  8:08             ` Stefano Garzarella
  0 siblings, 1 reply; 69+ messages in thread
From: Michael S. Tsirkin @ 2019-07-18 11:35 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Thu, Jul 18, 2019 at 11:37:30AM +0200, Stefano Garzarella wrote:
> On Thu, Jul 18, 2019 at 10:13 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
> > > On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> > > > > If the packets to sent to the guest are bigger than the buffer
> > > > > available, we can split them, using multiple buffers and fixing
> > > > > the length in the packet header.
> > > > > This is safe since virtio-vsock supports only stream sockets.
> > > > >
> > > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > >
> > > > So how does it work right now? If an app
> > > > does sendmsg with a 64K buffer and the other
> > > > side publishes 4K buffers - does it just stall?
> > >
> > > Before this series, the 64K (or bigger) user messages was split in 4K packets
> > > (fixed in the code) and queued in an internal list for the TX worker.
> > >
> > > After this series, we will queue up to 64K packets and then it will be split in
> > > the TX worker, depending on the size of the buffers available in the
> > > vring. (The idea was to allow EWMA or a configuration of the buffers size, but
> > > for now we postponed it)
> >
> > Got it. Using workers for xmit is IMHO a bad idea btw.
> > Why is it done like this?
> 
> Honestly, I don't know the exact reasons for this design, but I suppose
> that the idea was to have only one worker that uses the vring, and
> multiple user threads that enqueue packets in the list.
> This can simplify the code and we can put the user threads to sleep if
> we don't have "credit" available (this means that the receiver doesn't
> have space to receive the packet).


I think you mean the reverse: even without credits you can copy from
user and queue up data, then process it without waking up the user
thread.
Does it help though? It certainly adds up work outside of
user thread context which means it's not accounted for
correctly.

Maybe we want more VQs. Would help improve parallelism. The question
would then become how to map sockets to VQs. With a simple hash
it's easy to create collisions ...


> 
> What are the drawbacks in your opinion?
> 
> 
> Thanks,
> Stefano

- More pressure on scheduler
- Increased latency


-- 
MST

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 5/5] vsock/virtio: change the maximum packet size allowed
  2019-07-18  7:52     ` Stefano Garzarella
@ 2019-07-18 12:33       ` Michael S. Tsirkin
  2019-07-19  8:29         ` Stefano Garzarella
  0 siblings, 1 reply; 69+ messages in thread
From: Michael S. Tsirkin @ 2019-07-18 12:33 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Thu, Jul 18, 2019 at 09:52:41AM +0200, Stefano Garzarella wrote:
> On Wed, Jul 17, 2019 at 5:00 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Jul 17, 2019 at 01:30:30PM +0200, Stefano Garzarella wrote:
> > > Since now we are able to split packets, we can avoid limiting
> > > their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
> > > Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
> > > packet size.
> > >
> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> >
> >
> > OK so this is kind of like GSO where we are passing
> > 64K packets to the vsock and then split at the
> > low level.
> 
> Exactly, something like that in the Host->Guest path, instead in the
> Guest->Host we use the entire 64K packet.
> 
> Thanks,
> Stefano

btw two allocations for each packet isn't great. How about
allocating the struct linearly with the data?
And all buffers are same length for you - so you can actually
do alloc_pages.
Allocating/freeing pages in a batch should also be considered.

-- 
MST

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
  2019-07-18 11:35           ` Michael S. Tsirkin
@ 2019-07-19  8:08             ` Stefano Garzarella
  2019-07-19  8:21               ` Jason Wang
  0 siblings, 1 reply; 69+ messages in thread
From: Stefano Garzarella @ 2019-07-19  8:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Thu, Jul 18, 2019 at 07:35:46AM -0400, Michael S. Tsirkin wrote:
> On Thu, Jul 18, 2019 at 11:37:30AM +0200, Stefano Garzarella wrote:
> > On Thu, Jul 18, 2019 at 10:13 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
> > > > On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> > > > > > If the packets to sent to the guest are bigger than the buffer
> > > > > > available, we can split them, using multiple buffers and fixing
> > > > > > the length in the packet header.
> > > > > > This is safe since virtio-vsock supports only stream sockets.
> > > > > >
> > > > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > > >
> > > > > So how does it work right now? If an app
> > > > > does sendmsg with a 64K buffer and the other
> > > > > side publishes 4K buffers - does it just stall?
> > > >
> > > > Before this series, the 64K (or bigger) user messages was split in 4K packets
> > > > (fixed in the code) and queued in an internal list for the TX worker.
> > > >
> > > > After this series, we will queue up to 64K packets and then it will be split in
> > > > the TX worker, depending on the size of the buffers available in the
> > > > vring. (The idea was to allow EWMA or a configuration of the buffers size, but
> > > > for now we postponed it)
> > >
> > > Got it. Using workers for xmit is IMHO a bad idea btw.
> > > Why is it done like this?
> > 
> > Honestly, I don't know the exact reasons for this design, but I suppose
> > that the idea was to have only one worker that uses the vring, and
> > multiple user threads that enqueue packets in the list.
> > This can simplify the code and we can put the user threads to sleep if
> > we don't have "credit" available (this means that the receiver doesn't
> > have space to receive the packet).
> 
> 
> I think you mean the reverse: even without credits you can copy from
> user and queue up data, then process it without waking up the user
> thread.

I checked the code better, but it doesn't seem to do that.
The .sendmsg callback of af_vsock, check if the transport has space
(virtio-vsock transport returns the credit available). If there is no
space, it put the thread to sleep on the 'sk_sleep(sk)' wait_queue.

When the transport receives an update of credit available on the other
peer, it calls 'sk->sk_write_space(sk)' that wakes up the thread
sleeping, that will queue the new packet.

So, in the current implementation, the TX worker doesn't check the
credit available, it only sends the packets.

> Does it help though? It certainly adds up work outside of
> user thread context which means it's not accounted for
> correctly.

I can try to xmit the packet directly in the user thread context, to see
the improvements.

> 
> Maybe we want more VQs. Would help improve parallelism. The question
> would then become how to map sockets to VQs. With a simple hash
> it's easy to create collisions ...

Yes, more VQs can help but the map question is not simple to answer.
Maybe we can do an hash on the (cid, port) or do some kind of estimation
of queue utilization and try to balance.
Should the mapping be unique?

> 
> 
> > 
> > What are the drawbacks in your opinion?
> > 
> > 
> > Thanks,
> > Stefano
> 
> - More pressure on scheduler
> - Increased latency
> 

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
  2019-07-19  8:08             ` Stefano Garzarella
@ 2019-07-19  8:21               ` Jason Wang
  2019-07-19  8:39                 ` Stefano Garzarella
  0 siblings, 1 reply; 69+ messages in thread
From: Jason Wang @ 2019-07-19  8:21 UTC (permalink / raw)
  To: Stefano Garzarella, Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, kvm


On 2019/7/19 下午4:08, Stefano Garzarella wrote:
> On Thu, Jul 18, 2019 at 07:35:46AM -0400, Michael S. Tsirkin wrote:
>> On Thu, Jul 18, 2019 at 11:37:30AM +0200, Stefano Garzarella wrote:
>>> On Thu, Jul 18, 2019 at 10:13 AM Michael S. Tsirkin<mst@redhat.com>  wrote:
>>>> On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
>>>>> On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin<mst@redhat.com>  wrote:
>>>>>> On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
>>>>>>> If the packets to sent to the guest are bigger than the buffer
>>>>>>> available, we can split them, using multiple buffers and fixing
>>>>>>> the length in the packet header.
>>>>>>> This is safe since virtio-vsock supports only stream sockets.
>>>>>>>
>>>>>>> Signed-off-by: Stefano Garzarella<sgarzare@redhat.com>
>>>>>> So how does it work right now? If an app
>>>>>> does sendmsg with a 64K buffer and the other
>>>>>> side publishes 4K buffers - does it just stall?
>>>>> Before this series, the 64K (or bigger) user messages was split in 4K packets
>>>>> (fixed in the code) and queued in an internal list for the TX worker.
>>>>>
>>>>> After this series, we will queue up to 64K packets and then it will be split in
>>>>> the TX worker, depending on the size of the buffers available in the
>>>>> vring. (The idea was to allow EWMA or a configuration of the buffers size, but
>>>>> for now we postponed it)
>>>> Got it. Using workers for xmit is IMHO a bad idea btw.
>>>> Why is it done like this?
>>> Honestly, I don't know the exact reasons for this design, but I suppose
>>> that the idea was to have only one worker that uses the vring, and
>>> multiple user threads that enqueue packets in the list.
>>> This can simplify the code and we can put the user threads to sleep if
>>> we don't have "credit" available (this means that the receiver doesn't
>>> have space to receive the packet).
>> I think you mean the reverse: even without credits you can copy from
>> user and queue up data, then process it without waking up the user
>> thread.
> I checked the code better, but it doesn't seem to do that.
> The .sendmsg callback of af_vsock, check if the transport has space
> (virtio-vsock transport returns the credit available). If there is no
> space, it put the thread to sleep on the 'sk_sleep(sk)' wait_queue.
>
> When the transport receives an update of credit available on the other
> peer, it calls 'sk->sk_write_space(sk)' that wakes up the thread
> sleeping, that will queue the new packet.
>
> So, in the current implementation, the TX worker doesn't check the
> credit available, it only sends the packets.
>
>> Does it help though? It certainly adds up work outside of
>> user thread context which means it's not accounted for
>> correctly.
> I can try to xmit the packet directly in the user thread context, to see
> the improvements.


It will then looks more like what virtio-net (and other networking 
device) did.


>
>> Maybe we want more VQs. Would help improve parallelism. The question
>> would then become how to map sockets to VQs. With a simple hash
>> it's easy to create collisions ...
> Yes, more VQs can help but the map question is not simple to answer.
> Maybe we can do an hash on the (cid, port) or do some kind of estimation
> of queue utilization and try to balance.
> Should the mapping be unique?


It sounds to me you want some kind of fair queuing? We've already had 
several qdiscs that do this.

So if we use the kernel networking xmit path, all those issues could be 
addressed.

Thanks

>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 5/5] vsock/virtio: change the maximum packet size allowed
  2019-07-18 12:33       ` Michael S. Tsirkin
@ 2019-07-19  8:29         ` Stefano Garzarella
  0 siblings, 0 replies; 69+ messages in thread
From: Stefano Garzarella @ 2019-07-19  8:29 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Thu, Jul 18, 2019 at 08:33:40AM -0400, Michael S. Tsirkin wrote:
> On Thu, Jul 18, 2019 at 09:52:41AM +0200, Stefano Garzarella wrote:
> > On Wed, Jul 17, 2019 at 5:00 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Jul 17, 2019 at 01:30:30PM +0200, Stefano Garzarella wrote:
> > > > Since now we are able to split packets, we can avoid limiting
> > > > their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
> > > > Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
> > > > packet size.
> > > >
> > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > >
> > >
> > > OK so this is kind of like GSO where we are passing
> > > 64K packets to the vsock and then split at the
> > > low level.
> > 
> > Exactly, something like that in the Host->Guest path, instead in the
> > Guest->Host we use the entire 64K packet.
> > 
> > Thanks,
> > Stefano
> 
> btw two allocations for each packet isn't great. How about
> allocating the struct linearly with the data?

Are you referring to the kzalloc() to allocate the 'struct
virtio_vsock_pkt', followed by the kmalloc() to allocate the buffer?

Actually they don't look great, I will try to do a single allocation.

> And all buffers are same length for you - so you can actually
> do alloc_pages.

Yes, also Jason suggested it and we decided to postpone since we will
try to reuse the virtio-net where it comes for free.

> Allocating/freeing pages in a batch should also be considered.

For the allocation of guest rx buffers we do some kind of batching (we
refill the queue when it reaches the half), but only it this case :(

I'll try to do more alloc/free batching.

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
  2019-07-19  8:21               ` Jason Wang
@ 2019-07-19  8:39                 ` Stefano Garzarella
  2019-07-19  8:51                   ` Jason Wang
  0 siblings, 1 reply; 69+ messages in thread
From: Stefano Garzarella @ 2019-07-19  8:39 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, netdev, linux-kernel, Stefan Hajnoczi,
	David S. Miller, virtualization, kvm

On Fri, Jul 19, 2019 at 04:21:52PM +0800, Jason Wang wrote:
> 
> On 2019/7/19 下午4:08, Stefano Garzarella wrote:
> > On Thu, Jul 18, 2019 at 07:35:46AM -0400, Michael S. Tsirkin wrote:
> > > On Thu, Jul 18, 2019 at 11:37:30AM +0200, Stefano Garzarella wrote:
> > > > On Thu, Jul 18, 2019 at 10:13 AM Michael S. Tsirkin<mst@redhat.com>  wrote:
> > > > > On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
> > > > > > On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin<mst@redhat.com>  wrote:
> > > > > > > On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> > > > > > > > If the packets to sent to the guest are bigger than the buffer
> > > > > > > > available, we can split them, using multiple buffers and fixing
> > > > > > > > the length in the packet header.
> > > > > > > > This is safe since virtio-vsock supports only stream sockets.
> > > > > > > > 
> > > > > > > > Signed-off-by: Stefano Garzarella<sgarzare@redhat.com>
> > > > > > > So how does it work right now? If an app
> > > > > > > does sendmsg with a 64K buffer and the other
> > > > > > > side publishes 4K buffers - does it just stall?
> > > > > > Before this series, the 64K (or bigger) user messages was split in 4K packets
> > > > > > (fixed in the code) and queued in an internal list for the TX worker.
> > > > > > 
> > > > > > After this series, we will queue up to 64K packets and then it will be split in
> > > > > > the TX worker, depending on the size of the buffers available in the
> > > > > > vring. (The idea was to allow EWMA or a configuration of the buffers size, but
> > > > > > for now we postponed it)
> > > > > Got it. Using workers for xmit is IMHO a bad idea btw.
> > > > > Why is it done like this?
> > > > Honestly, I don't know the exact reasons for this design, but I suppose
> > > > that the idea was to have only one worker that uses the vring, and
> > > > multiple user threads that enqueue packets in the list.
> > > > This can simplify the code and we can put the user threads to sleep if
> > > > we don't have "credit" available (this means that the receiver doesn't
> > > > have space to receive the packet).
> > > I think you mean the reverse: even without credits you can copy from
> > > user and queue up data, then process it without waking up the user
> > > thread.
> > I checked the code better, but it doesn't seem to do that.
> > The .sendmsg callback of af_vsock, check if the transport has space
> > (virtio-vsock transport returns the credit available). If there is no
> > space, it put the thread to sleep on the 'sk_sleep(sk)' wait_queue.
> > 
> > When the transport receives an update of credit available on the other
> > peer, it calls 'sk->sk_write_space(sk)' that wakes up the thread
> > sleeping, that will queue the new packet.
> > 
> > So, in the current implementation, the TX worker doesn't check the
> > credit available, it only sends the packets.
> > 
> > > Does it help though? It certainly adds up work outside of
> > > user thread context which means it's not accounted for
> > > correctly.
> > I can try to xmit the packet directly in the user thread context, to see
> > the improvements.
> 
> 
> It will then looks more like what virtio-net (and other networking device)
> did.

I'll try ASAP, the changes should not be too complicated... I hope :)

> 
> 
> > 
> > > Maybe we want more VQs. Would help improve parallelism. The question
> > > would then become how to map sockets to VQs. With a simple hash
> > > it's easy to create collisions ...
> > Yes, more VQs can help but the map question is not simple to answer.
> > Maybe we can do an hash on the (cid, port) or do some kind of estimation
> > of queue utilization and try to balance.
> > Should the mapping be unique?
> 
> 
> It sounds to me you want some kind of fair queuing? We've already had
> several qdiscs that do this.

Thanks for pointing it out!

> 
> So if we use the kernel networking xmit path, all those issues could be
> addressed.

One more point to AF_VSOCK + net-stack, but we have to evaluate possible
drawbacks in using the net-stack. (e.g. more latency due to the complexity
of the net-stack?)

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
  2019-07-19  8:39                 ` Stefano Garzarella
@ 2019-07-19  8:51                   ` Jason Wang
  2019-07-19  9:20                     ` Stefano Garzarella
  0 siblings, 1 reply; 69+ messages in thread
From: Jason Wang @ 2019-07-19  8:51 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Michael S. Tsirkin, netdev, linux-kernel, Stefan Hajnoczi,
	David S. Miller, virtualization, kvm


On 2019/7/19 下午4:39, Stefano Garzarella wrote:
> On Fri, Jul 19, 2019 at 04:21:52PM +0800, Jason Wang wrote:
>> On 2019/7/19 下午4:08, Stefano Garzarella wrote:
>>> On Thu, Jul 18, 2019 at 07:35:46AM -0400, Michael S. Tsirkin wrote:
>>>> On Thu, Jul 18, 2019 at 11:37:30AM +0200, Stefano Garzarella wrote:
>>>>> On Thu, Jul 18, 2019 at 10:13 AM Michael S. Tsirkin<mst@redhat.com>  wrote:
>>>>>> On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
>>>>>>> On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin<mst@redhat.com>  wrote:
>>>>>>>> On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
>>>>>>>>> If the packets to sent to the guest are bigger than the buffer
>>>>>>>>> available, we can split them, using multiple buffers and fixing
>>>>>>>>> the length in the packet header.
>>>>>>>>> This is safe since virtio-vsock supports only stream sockets.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Stefano Garzarella<sgarzare@redhat.com>
>>>>>>>> So how does it work right now? If an app
>>>>>>>> does sendmsg with a 64K buffer and the other
>>>>>>>> side publishes 4K buffers - does it just stall?
>>>>>>> Before this series, the 64K (or bigger) user messages was split in 4K packets
>>>>>>> (fixed in the code) and queued in an internal list for the TX worker.
>>>>>>>
>>>>>>> After this series, we will queue up to 64K packets and then it will be split in
>>>>>>> the TX worker, depending on the size of the buffers available in the
>>>>>>> vring. (The idea was to allow EWMA or a configuration of the buffers size, but
>>>>>>> for now we postponed it)
>>>>>> Got it. Using workers for xmit is IMHO a bad idea btw.
>>>>>> Why is it done like this?
>>>>> Honestly, I don't know the exact reasons for this design, but I suppose
>>>>> that the idea was to have only one worker that uses the vring, and
>>>>> multiple user threads that enqueue packets in the list.
>>>>> This can simplify the code and we can put the user threads to sleep if
>>>>> we don't have "credit" available (this means that the receiver doesn't
>>>>> have space to receive the packet).
>>>> I think you mean the reverse: even without credits you can copy from
>>>> user and queue up data, then process it without waking up the user
>>>> thread.
>>> I checked the code better, but it doesn't seem to do that.
>>> The .sendmsg callback of af_vsock, check if the transport has space
>>> (virtio-vsock transport returns the credit available). If there is no
>>> space, it put the thread to sleep on the 'sk_sleep(sk)' wait_queue.
>>>
>>> When the transport receives an update of credit available on the other
>>> peer, it calls 'sk->sk_write_space(sk)' that wakes up the thread
>>> sleeping, that will queue the new packet.
>>>
>>> So, in the current implementation, the TX worker doesn't check the
>>> credit available, it only sends the packets.
>>>
>>>> Does it help though? It certainly adds up work outside of
>>>> user thread context which means it's not accounted for
>>>> correctly.
>>> I can try to xmit the packet directly in the user thread context, to see
>>> the improvements.
>>
>> It will then looks more like what virtio-net (and other networking device)
>> did.
> I'll try ASAP, the changes should not be too complicated... I hope :)
>
>>
>>>> Maybe we want more VQs. Would help improve parallelism. The question
>>>> would then become how to map sockets to VQs. With a simple hash
>>>> it's easy to create collisions ...
>>> Yes, more VQs can help but the map question is not simple to answer.
>>> Maybe we can do an hash on the (cid, port) or do some kind of estimation
>>> of queue utilization and try to balance.
>>> Should the mapping be unique?
>>
>> It sounds to me you want some kind of fair queuing? We've already had
>> several qdiscs that do this.
> Thanks for pointing it out!
>
>> So if we use the kernel networking xmit path, all those issues could be
>> addressed.
> One more point to AF_VSOCK + net-stack, but we have to evaluate possible
> drawbacks in using the net-stack. (e.g. more latency due to the complexity
> of the net-stack?)


Yes, we need benchmark the performance. But as we've noticed, current 
vsock implementation is not efficient, and for stream socket, the 
overhead should be minimal. The most important thing is to avoid 
reinventing things that has already existed.

Thanks


>
> Thanks,
> Stefano

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
  2019-07-19  8:51                   ` Jason Wang
@ 2019-07-19  9:20                     ` Stefano Garzarella
  0 siblings, 0 replies; 69+ messages in thread
From: Stefano Garzarella @ 2019-07-19  9:20 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, netdev, linux-kernel, Stefan Hajnoczi,
	David S. Miller, virtualization, kvm

On Fri, Jul 19, 2019 at 04:51:00PM +0800, Jason Wang wrote:
> 
> On 2019/7/19 下午4:39, Stefano Garzarella wrote:
> > On Fri, Jul 19, 2019 at 04:21:52PM +0800, Jason Wang wrote:
> > > On 2019/7/19 下午4:08, Stefano Garzarella wrote:
> > > > On Thu, Jul 18, 2019 at 07:35:46AM -0400, Michael S. Tsirkin wrote:
> > > > > On Thu, Jul 18, 2019 at 11:37:30AM +0200, Stefano Garzarella wrote:
> > > > > > On Thu, Jul 18, 2019 at 10:13 AM Michael S. Tsirkin<mst@redhat.com>  wrote:
> > > > > > > On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
> > > > > > > > On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin<mst@redhat.com>  wrote:
> > > > > > > > > On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> > > > > > > > > > If the packets to sent to the guest are bigger than the buffer
> > > > > > > > > > available, we can split them, using multiple buffers and fixing
> > > > > > > > > > the length in the packet header.
> > > > > > > > > > This is safe since virtio-vsock supports only stream sockets.
> > > > > > > > > > 
> > > > > > > > > > Signed-off-by: Stefano Garzarella<sgarzare@redhat.com>
> > > > > > > > > So how does it work right now? If an app
> > > > > > > > > does sendmsg with a 64K buffer and the other
> > > > > > > > > side publishes 4K buffers - does it just stall?
> > > > > > > > Before this series, the 64K (or bigger) user messages was split in 4K packets
> > > > > > > > (fixed in the code) and queued in an internal list for the TX worker.
> > > > > > > > 
> > > > > > > > After this series, we will queue up to 64K packets and then it will be split in
> > > > > > > > the TX worker, depending on the size of the buffers available in the
> > > > > > > > vring. (The idea was to allow EWMA or a configuration of the buffers size, but
> > > > > > > > for now we postponed it)
> > > > > > > Got it. Using workers for xmit is IMHO a bad idea btw.
> > > > > > > Why is it done like this?
> > > > > > Honestly, I don't know the exact reasons for this design, but I suppose
> > > > > > that the idea was to have only one worker that uses the vring, and
> > > > > > multiple user threads that enqueue packets in the list.
> > > > > > This can simplify the code and we can put the user threads to sleep if
> > > > > > we don't have "credit" available (this means that the receiver doesn't
> > > > > > have space to receive the packet).
> > > > > I think you mean the reverse: even without credits you can copy from
> > > > > user and queue up data, then process it without waking up the user
> > > > > thread.
> > > > I checked the code better, but it doesn't seem to do that.
> > > > The .sendmsg callback of af_vsock, check if the transport has space
> > > > (virtio-vsock transport returns the credit available). If there is no
> > > > space, it put the thread to sleep on the 'sk_sleep(sk)' wait_queue.
> > > > 
> > > > When the transport receives an update of credit available on the other
> > > > peer, it calls 'sk->sk_write_space(sk)' that wakes up the thread
> > > > sleeping, that will queue the new packet.
> > > > 
> > > > So, in the current implementation, the TX worker doesn't check the
> > > > credit available, it only sends the packets.
> > > > 
> > > > > Does it help though? It certainly adds up work outside of
> > > > > user thread context which means it's not accounted for
> > > > > correctly.
> > > > I can try to xmit the packet directly in the user thread context, to see
> > > > the improvements.
> > > 
> > > It will then looks more like what virtio-net (and other networking device)
> > > did.
> > I'll try ASAP, the changes should not be too complicated... I hope :)
> > 
> > > 
> > > > > Maybe we want more VQs. Would help improve parallelism. The question
> > > > > would then become how to map sockets to VQs. With a simple hash
> > > > > it's easy to create collisions ...
> > > > Yes, more VQs can help but the map question is not simple to answer.
> > > > Maybe we can do an hash on the (cid, port) or do some kind of estimation
> > > > of queue utilization and try to balance.
> > > > Should the mapping be unique?
> > > 
> > > It sounds to me you want some kind of fair queuing? We've already had
> > > several qdiscs that do this.
> > Thanks for pointing it out!
> > 
> > > So if we use the kernel networking xmit path, all those issues could be
> > > addressed.
> > One more point to AF_VSOCK + net-stack, but we have to evaluate possible
> > drawbacks in using the net-stack. (e.g. more latency due to the complexity
> > of the net-stack?)
> 
> 
> Yes, we need benchmark the performance. But as we've noticed, current vsock
> implementation is not efficient, and for stream socket, the overhead should
> be minimal. The most important thing is to avoid reinventing things that has
> already existed.

Got it. I completely agree with you, and I want to avoid reinventing things
(surely in a worse way).

But the idea (suggested also by Micheal) to discover how fast can go a
new protocol separate from the networking stack, is quite attractive :)

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 2/5] vsock/virtio: reduce credit update messages
  2019-07-17 11:30 ` [PATCH v4 2/5] vsock/virtio: reduce credit update messages Stefano Garzarella
@ 2019-07-22  8:36   ` Stefan Hajnoczi
  2019-09-03  4:38   ` Michael S. Tsirkin
  1 sibling, 0 replies; 69+ messages in thread
From: Stefan Hajnoczi @ 2019-07-22  8:36 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, David S. Miller, virtualization,
	Jason Wang, kvm, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 654 bytes --]

On Wed, Jul 17, 2019 at 01:30:27PM +0200, Stefano Garzarella wrote:
> In order to reduce the number of credit update messages,
> we send them only when the space available seen by the
> transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.
> 
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  include/linux/virtio_vsock.h            |  1 +
>  net/vmw_vsock/virtio_transport_common.c | 16 +++++++++++++---
>  2 files changed, 14 insertions(+), 3 deletions(-)

It's an arbitrary limit but the risk of regressions is low since the
credit update traffic was so excessive:

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 3/5] vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
  2019-07-17 11:30 ` [PATCH v4 3/5] vsock/virtio: fix locking in virtio_transport_inc_tx_pkt() Stefano Garzarella
  2019-07-17 14:51   ` Michael S. Tsirkin
@ 2019-07-22  8:53   ` Stefan Hajnoczi
  1 sibling, 0 replies; 69+ messages in thread
From: Stefan Hajnoczi @ 2019-07-22  8:53 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, David S. Miller, virtualization,
	Jason Wang, kvm, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 521 bytes --]

On Wed, Jul 17, 2019 at 01:30:28PM +0200, Stefano Garzarella wrote:
> fwd_cnt and last_fwd_cnt are protected by rx_lock, so we should use
> the same spinlock also if we are in the TX path.
> 
> Move also buf_alloc under the same lock.
> 
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  include/linux/virtio_vsock.h            | 2 +-
>  net/vmw_vsock/virtio_transport_common.c | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
  2019-07-17 11:30 ` [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers Stefano Garzarella
  2019-07-17 14:54   ` Michael S. Tsirkin
@ 2019-07-22  9:06   ` Stefan Hajnoczi
  1 sibling, 0 replies; 69+ messages in thread
From: Stefan Hajnoczi @ 2019-07-22  9:06 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, David S. Miller, virtualization,
	Jason Wang, kvm, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 613 bytes --]

On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> If the packets to sent to the guest are bigger than the buffer
> available, we can split them, using multiple buffers and fixing
> the length in the packet header.
> This is safe since virtio-vsock supports only stream sockets.
> 
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  drivers/vhost/vsock.c                   | 66 ++++++++++++++++++-------
>  net/vmw_vsock/virtio_transport_common.c | 15 ++++--
>  2 files changed, 60 insertions(+), 21 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 5/5] vsock/virtio: change the maximum packet size allowed
  2019-07-17 11:30 ` [PATCH v4 5/5] vsock/virtio: change the maximum packet size allowed Stefano Garzarella
  2019-07-17 14:59   ` Michael S. Tsirkin
@ 2019-07-22  9:07   ` Stefan Hajnoczi
  1 sibling, 0 replies; 69+ messages in thread
From: Stefan Hajnoczi @ 2019-07-22  9:07 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, David S. Miller, virtualization,
	Jason Wang, kvm, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 495 bytes --]

On Wed, Jul 17, 2019 at 01:30:30PM +0200, Stefano Garzarella wrote:
> Since now we are able to split packets, we can avoid limiting
> their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
> Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
> packet size.
> 
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  net/vmw_vsock/virtio_transport_common.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput
  2019-07-17 11:30 [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
                   ` (4 preceding siblings ...)
  2019-07-17 11:30 ` [PATCH v4 5/5] vsock/virtio: change the maximum packet size allowed Stefano Garzarella
@ 2019-07-22  9:08 ` Stefan Hajnoczi
  2019-07-22  9:14   ` Stefano Garzarella
  2019-07-29 13:59 ` Michael S. Tsirkin
  2019-09-03  8:02 ` request for stable (was Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput) Michael S. Tsirkin
  7 siblings, 1 reply; 69+ messages in thread
From: Stefan Hajnoczi @ 2019-07-22  9:08 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, David S. Miller, virtualization,
	Jason Wang, kvm, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 942 bytes --]

On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:
> This series tries to increase the throughput of virtio-vsock with slight
> changes.
> While I was testing the v2 of this series I discovered an huge use of memory,
> so I added patch 1 to mitigate this issue. I put it in this series in order
> to better track the performance trends.
> 
> v4:
> - rebased all patches on current master (conflicts is Patch 4)
> - Patch 1: added Stefan's R-b
> - Patch 3: removed lock when buf_alloc is written [David];
>            moved this patch after "vsock/virtio: reduce credit update messages"
>            to make it clearer
> - Patch 4: vhost_exceeds_weight() is recently introduced, so I've solved some
>            conflicts

Stefano: Do you want to continue experimenting before we merge this
patch series?  The code looks functionally correct and the performance
increases, so I'm happy for it to be merged.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput
  2019-07-22  9:08 ` [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput Stefan Hajnoczi
@ 2019-07-22  9:14   ` Stefano Garzarella
  2019-07-29 13:12     ` Stefan Hajnoczi
  0 siblings, 1 reply; 69+ messages in thread
From: Stefano Garzarella @ 2019-07-22  9:14 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: netdev, linux-kernel, David S. Miller, virtualization,
	Jason Wang, kvm, Michael S. Tsirkin

On Mon, Jul 22, 2019 at 10:08:35AM +0100, Stefan Hajnoczi wrote:
> On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:
> > This series tries to increase the throughput of virtio-vsock with slight
> > changes.
> > While I was testing the v2 of this series I discovered an huge use of memory,
> > so I added patch 1 to mitigate this issue. I put it in this series in order
> > to better track the performance trends.
> > 
> > v4:
> > - rebased all patches on current master (conflicts is Patch 4)
> > - Patch 1: added Stefan's R-b
> > - Patch 3: removed lock when buf_alloc is written [David];
> >            moved this patch after "vsock/virtio: reduce credit update messages"
> >            to make it clearer
> > - Patch 4: vhost_exceeds_weight() is recently introduced, so I've solved some
> >            conflicts
> 
> Stefano: Do you want to continue experimenting before we merge this
> patch series?  The code looks functionally correct and the performance
> increases, so I'm happy for it to be merged.

I think we can merge this series.

I'll continue to do other experiments (e.g. removing TX workers, allocating
pages, etc.) but I think these changes are prerequisites for the other patches,
so we can merge them.

Thank you very much for the reviews!
Stefano

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput
  2019-07-22  9:14   ` Stefano Garzarella
@ 2019-07-29 13:12     ` Stefan Hajnoczi
  0 siblings, 0 replies; 69+ messages in thread
From: Stefan Hajnoczi @ 2019-07-29 13:12 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Stefan Hajnoczi, kvm, Michael S. Tsirkin, netdev, linux-kernel,
	virtualization, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 1683 bytes --]

On Mon, Jul 22, 2019 at 11:14:34AM +0200, Stefano Garzarella wrote:
> On Mon, Jul 22, 2019 at 10:08:35AM +0100, Stefan Hajnoczi wrote:
> > On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:
> > > This series tries to increase the throughput of virtio-vsock with slight
> > > changes.
> > > While I was testing the v2 of this series I discovered an huge use of memory,
> > > so I added patch 1 to mitigate this issue. I put it in this series in order
> > > to better track the performance trends.
> > > 
> > > v4:
> > > - rebased all patches on current master (conflicts is Patch 4)
> > > - Patch 1: added Stefan's R-b
> > > - Patch 3: removed lock when buf_alloc is written [David];
> > >            moved this patch after "vsock/virtio: reduce credit update messages"
> > >            to make it clearer
> > > - Patch 4: vhost_exceeds_weight() is recently introduced, so I've solved some
> > >            conflicts
> > 
> > Stefano: Do you want to continue experimenting before we merge this
> > patch series?  The code looks functionally correct and the performance
> > increases, so I'm happy for it to be merged.
> 
> I think we can merge this series.
> 
> I'll continue to do other experiments (e.g. removing TX workers, allocating
> pages, etc.) but I think these changes are prerequisites for the other patches,
> so we can merge them.
> 
> Thank you very much for the reviews!

All patches have been reviewed by here.  Have an Ack for good measure:

Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

The topics discussed in sub-threads relate to longer-term optimization
work that doesn't block this series.  Please merge.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput
  2019-07-17 11:30 [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
                   ` (5 preceding siblings ...)
  2019-07-22  9:08 ` [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput Stefan Hajnoczi
@ 2019-07-29 13:59 ` Michael S. Tsirkin
  2019-07-30  9:40   ` Stefano Garzarella
  2019-09-03  8:02 ` request for stable (was Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput) Michael S. Tsirkin
  7 siblings, 1 reply; 69+ messages in thread
From: Michael S. Tsirkin @ 2019-07-29 13:59 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:
> This series tries to increase the throughput of virtio-vsock with slight
> changes.
> While I was testing the v2 of this series I discovered an huge use of memory,
> so I added patch 1 to mitigate this issue. I put it in this series in order
> to better track the performance trends.

Series:

Acked-by: Michael S. Tsirkin <mst@redhat.com>

Can this go into net-next?


> v4:
> - rebased all patches on current master (conflicts is Patch 4)
> - Patch 1: added Stefan's R-b
> - Patch 3: removed lock when buf_alloc is written [David];
>            moved this patch after "vsock/virtio: reduce credit update messages"
>            to make it clearer
> - Patch 4: vhost_exceeds_weight() is recently introduced, so I've solved some
>            conflicts
> 
> v3: https://patchwork.kernel.org/cover/10970145
> 
> v2: https://patchwork.kernel.org/cover/10938743
> 
> v1: https://patchwork.kernel.org/cover/10885431
> 
> Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> support. As Micheal suggested in the v1, I booted host and guest with 'nosmap'.
> 
> A brief description of patches:
> - Patches 1:   limit the memory usage with an extra copy for small packets
> - Patches 2+3: reduce the number of credit update messages sent to the
>                transmitter
> - Patches 4+5: allow the host to split packets on multiple buffers and use
>                VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
> 
>                     host -> guest [Gbps]
> pkt_size before opt   p 1     p 2+3    p 4+5
> 
> 32         0.032     0.030    0.048    0.051
> 64         0.061     0.059    0.108    0.117
> 128        0.122     0.112    0.227    0.234
> 256        0.244     0.241    0.418    0.415
> 512        0.459     0.466    0.847    0.865
> 1K         0.927     0.919    1.657    1.641
> 2K         1.884     1.813    3.262    3.269
> 4K         3.378     3.326    6.044    6.195
> 8K         5.637     5.676   10.141   11.287
> 16K        8.250     8.402   15.976   16.736
> 32K       13.327    13.204   19.013   20.515
> 64K       21.241    21.341   20.973   21.879
> 128K      21.851    22.354   21.816   23.203
> 256K      21.408    21.693   21.846   24.088
> 512K      21.600    21.899   21.921   24.106
> 
>                     guest -> host [Gbps]
> pkt_size before opt   p 1     p 2+3    p 4+5
> 
> 32         0.045     0.046    0.057    0.057
> 64         0.089     0.091    0.103    0.104
> 128        0.170     0.179    0.192    0.200
> 256        0.364     0.351    0.361    0.379
> 512        0.709     0.699    0.731    0.790
> 1K         1.399     1.407    1.395    1.427
> 2K         2.670     2.684    2.745    2.835
> 4K         5.171     5.199    5.305    5.451
> 8K         8.442     8.500   10.083    9.941
> 16K       12.305    12.259   13.519   15.385
> 32K       11.418    11.150   11.988   24.680
> 64K       10.778    10.659   11.589   35.273
> 128K      10.421    10.339   10.939   40.338
> 256K      10.300     9.719   10.508   36.562
> 512K       9.833     9.808   10.612   35.979
> 
> As Stefan suggested in the v1, I measured also the efficiency in this way:
>     efficiency = Mbps / (%CPU_Host + %CPU_Guest)
> 
> The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> but it's provided for free from iperf3 and could be an indication.
> 
>         host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt   p 1     p 2+3    p 4+5
> 
> 32         0.35      0.45     0.79     1.02
> 64         0.56      0.80     1.41     1.54
> 128        1.11      1.52     3.03     3.12
> 256        2.20      2.16     5.44     5.58
> 512        4.17      4.18    10.96    11.46
> 1K         8.30      8.26    20.99    20.89
> 2K        16.82     16.31    39.76    39.73
> 4K        30.89     30.79    74.07    75.73
> 8K        53.74     54.49   124.24   148.91
> 16K       80.68     83.63   200.21   232.79
> 32K      132.27    132.52   260.81   357.07
> 64K      229.82    230.40   300.19   444.18
> 128K     332.60    329.78   331.51   492.28
> 256K     331.06    337.22   339.59   511.59
> 512K     335.58    328.50   331.56   504.56
> 
>         guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt   p 1     p 2+3    p 4+5
> 
> 32         0.43      0.43     0.53     0.56
> 64         0.85      0.86     1.04     1.10
> 128        1.63      1.71     2.07     2.13
> 256        3.48      3.35     4.02     4.22
> 512        6.80      6.67     7.97     8.63
> 1K        13.32     13.31    15.72    15.94
> 2K        25.79     25.92    30.84    30.98
> 4K        50.37     50.48    58.79    59.69
> 8K        95.90     96.15   107.04   110.33
> 16K      145.80    145.43   143.97   174.70
> 32K      147.06    144.74   146.02   282.48
> 64K      145.25    143.99   141.62   406.40
> 128K     149.34    146.96   147.49   489.34
> 256K     156.35    149.81   152.21   536.37
> 512K     151.65    150.74   151.52   519.93
> 
> [1] https://github.com/stefano-garzarella/iperf/
> 
> Stefano Garzarella (5):
>   vsock/virtio: limit the memory used per-socket
>   vsock/virtio: reduce credit update messages
>   vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
>   vhost/vsock: split packets to send using multiple buffers
>   vsock/virtio: change the maximum packet size allowed
> 
>  drivers/vhost/vsock.c                   | 68 ++++++++++++-----
>  include/linux/virtio_vsock.h            |  4 +-
>  net/vmw_vsock/virtio_transport.c        |  1 +
>  net/vmw_vsock/virtio_transport_common.c | 99 ++++++++++++++++++++-----
>  4 files changed, 134 insertions(+), 38 deletions(-)
> 
> -- 
> 2.20.1

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-07-17 11:30 ` [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket Stefano Garzarella
@ 2019-07-29 14:04   ` Michael S. Tsirkin
  2019-07-29 15:36     ` Stefano Garzarella
  2019-08-30  9:40     ` Stefano Garzarella
  0 siblings, 2 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2019-07-29 14:04 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> Since virtio-vsock was introduced, the buffers filled by the host
> and pushed to the guest using the vring, are directly queued in
> a per-socket list. These buffers are preallocated by the guest
> with a fixed size (4 KB).
> 
> The maximum amount of memory used by each socket should be
> controlled by the credit mechanism.
> The default credit available per-socket is 256 KB, but if we use
> only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> buffers, using up to 1 GB of memory per-socket. In addition, the
> guest will continue to fill the vring with new 4 KB free buffers
> to avoid starvation of other sockets.
> 
> This patch mitigates this issue copying the payload of small
> packets (< 128 bytes) into the buffer of last packet queued, in
> order to avoid wasting memory.
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>

This is good enough for net-next, but for net I think we
should figure out how to address the issue completely.
Can we make the accounting precise? What happens to
performance if we do?


> ---
>  drivers/vhost/vsock.c                   |  2 +
>  include/linux/virtio_vsock.h            |  1 +
>  net/vmw_vsock/virtio_transport.c        |  1 +
>  net/vmw_vsock/virtio_transport_common.c | 60 +++++++++++++++++++++----
>  4 files changed, 55 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index 6a50e1d0529c..6c8390a2af52 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -329,6 +329,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
>  		return NULL;
>  	}
>  
> +	pkt->buf_len = pkt->len;
> +
>  	nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter);
>  	if (nbytes != pkt->len) {
>  		vq_err(vq, "Expected %u byte payload, got %zu bytes\n",
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index e223e2632edd..7d973903f52e 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -52,6 +52,7 @@ struct virtio_vsock_pkt {
>  	/* socket refcnt not held, only use for cancellation */
>  	struct vsock_sock *vsk;
>  	void *buf;
> +	u32 buf_len;
>  	u32 len;
>  	u32 off;
>  	bool reply;
> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> index 0815d1357861..082a30936690 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -307,6 +307,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
>  			break;
>  		}
>  
> +		pkt->buf_len = buf_len;
>  		pkt->len = buf_len;
>  
>  		sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 6f1a8aff65c5..095221f94786 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -26,6 +26,9 @@
>  /* How long to wait for graceful shutdown of a connection */
>  #define VSOCK_CLOSE_TIMEOUT (8 * HZ)
>  
> +/* Threshold for detecting small packets to copy */
> +#define GOOD_COPY_LEN  128
> +
>  static const struct virtio_transport *virtio_transport_get_ops(void)
>  {
>  	const struct vsock_transport *t = vsock_core_get_transport();
> @@ -64,6 +67,9 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
>  		pkt->buf = kmalloc(len, GFP_KERNEL);
>  		if (!pkt->buf)
>  			goto out_pkt;
> +
> +		pkt->buf_len = len;
> +
>  		err = memcpy_from_msg(pkt->buf, info->msg, len);
>  		if (err)
>  			goto out;
> @@ -841,24 +847,60 @@ virtio_transport_recv_connecting(struct sock *sk,
>  	return err;
>  }
>  
> +static void
> +virtio_transport_recv_enqueue(struct vsock_sock *vsk,
> +			      struct virtio_vsock_pkt *pkt)
> +{
> +	struct virtio_vsock_sock *vvs = vsk->trans;
> +	bool free_pkt = false;
> +
> +	pkt->len = le32_to_cpu(pkt->hdr.len);
> +	pkt->off = 0;
> +
> +	spin_lock_bh(&vvs->rx_lock);
> +
> +	virtio_transport_inc_rx_pkt(vvs, pkt);
> +
> +	/* Try to copy small packets into the buffer of last packet queued,
> +	 * to avoid wasting memory queueing the entire buffer with a small
> +	 * payload.
> +	 */
> +	if (pkt->len <= GOOD_COPY_LEN && !list_empty(&vvs->rx_queue)) {
> +		struct virtio_vsock_pkt *last_pkt;
> +
> +		last_pkt = list_last_entry(&vvs->rx_queue,
> +					   struct virtio_vsock_pkt, list);
> +
> +		/* If there is space in the last packet queued, we copy the
> +		 * new packet in its buffer.
> +		 */
> +		if (pkt->len <= last_pkt->buf_len - last_pkt->len) {
> +			memcpy(last_pkt->buf + last_pkt->len, pkt->buf,
> +			       pkt->len);
> +			last_pkt->len += pkt->len;
> +			free_pkt = true;
> +			goto out;
> +		}
> +	}
> +
> +	list_add_tail(&pkt->list, &vvs->rx_queue);
> +
> +out:
> +	spin_unlock_bh(&vvs->rx_lock);
> +	if (free_pkt)
> +		virtio_transport_free_pkt(pkt);
> +}
> +
>  static int
>  virtio_transport_recv_connected(struct sock *sk,
>  				struct virtio_vsock_pkt *pkt)
>  {
>  	struct vsock_sock *vsk = vsock_sk(sk);
> -	struct virtio_vsock_sock *vvs = vsk->trans;
>  	int err = 0;
>  
>  	switch (le16_to_cpu(pkt->hdr.op)) {
>  	case VIRTIO_VSOCK_OP_RW:
> -		pkt->len = le32_to_cpu(pkt->hdr.len);
> -		pkt->off = 0;
> -
> -		spin_lock_bh(&vvs->rx_lock);
> -		virtio_transport_inc_rx_pkt(vvs, pkt);
> -		list_add_tail(&pkt->list, &vvs->rx_queue);
> -		spin_unlock_bh(&vvs->rx_lock);
> -
> +		virtio_transport_recv_enqueue(vsk, pkt);
>  		sk->sk_data_ready(sk);
>  		return err;
>  	case VIRTIO_VSOCK_OP_CREDIT_UPDATE:
> -- 
> 2.20.1

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-07-29 14:04   ` Michael S. Tsirkin
@ 2019-07-29 15:36     ` Stefano Garzarella
  2019-07-29 15:49       ` Michael S. Tsirkin
  2019-07-29 16:01       ` Michael S. Tsirkin
  2019-08-30  9:40     ` Stefano Garzarella
  1 sibling, 2 replies; 69+ messages in thread
From: Stefano Garzarella @ 2019-07-29 15:36 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > Since virtio-vsock was introduced, the buffers filled by the host
> > and pushed to the guest using the vring, are directly queued in
> > a per-socket list. These buffers are preallocated by the guest
> > with a fixed size (4 KB).
> > 
> > The maximum amount of memory used by each socket should be
> > controlled by the credit mechanism.
> > The default credit available per-socket is 256 KB, but if we use
> > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > buffers, using up to 1 GB of memory per-socket. In addition, the
> > guest will continue to fill the vring with new 4 KB free buffers
> > to avoid starvation of other sockets.
> > 
> > This patch mitigates this issue copying the payload of small
> > packets (< 128 bytes) into the buffer of last packet queued, in
> > order to avoid wasting memory.
> > 
> > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> 
> This is good enough for net-next, but for net I think we
> should figure out how to address the issue completely.
> Can we make the accounting precise? What happens to
> performance if we do?
> 

In order to do more precise accounting maybe we can use the buffer size,
instead of payload size when we update the credit available.
In this way, the credit available for each socket will reflect the memory
actually used.

I should check better, because I'm not sure what happen if the peer sees
1KB of space available, then it sends 1KB of payload (using a 4KB
buffer).

The other option is to copy each packet in a new buffer like I did in
the v2 [2], but this forces us to make a copy for each packet that does
not fill the entire buffer, perhaps too expensive.

[2] https://patchwork.kernel.org/patch/10938741/


Thanks,
Stefano

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-07-29 15:36     ` Stefano Garzarella
@ 2019-07-29 15:49       ` Michael S. Tsirkin
  2019-07-29 16:19         ` Stefano Garzarella
  2019-07-29 16:01       ` Michael S. Tsirkin
  1 sibling, 1 reply; 69+ messages in thread
From: Michael S. Tsirkin @ 2019-07-29 15:49 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Mon, Jul 29, 2019 at 05:36:56PM +0200, Stefano Garzarella wrote:
> On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > Since virtio-vsock was introduced, the buffers filled by the host
> > > and pushed to the guest using the vring, are directly queued in
> > > a per-socket list. These buffers are preallocated by the guest
> > > with a fixed size (4 KB).
> > > 
> > > The maximum amount of memory used by each socket should be
> > > controlled by the credit mechanism.
> > > The default credit available per-socket is 256 KB, but if we use
> > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > guest will continue to fill the vring with new 4 KB free buffers
> > > to avoid starvation of other sockets.
> > > 
> > > This patch mitigates this issue copying the payload of small
> > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > order to avoid wasting memory.
> > > 
> > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > 
> > This is good enough for net-next, but for net I think we
> > should figure out how to address the issue completely.
> > Can we make the accounting precise? What happens to
> > performance if we do?
> > 
> 
> In order to do more precise accounting maybe we can use the buffer size,
> instead of payload size when we update the credit available.
> In this way, the credit available for each socket will reflect the memory
> actually used.
> 
> I should check better, because I'm not sure what happen if the peer sees
> 1KB of space available, then it sends 1KB of payload (using a 4KB
> buffer).
> 
> The other option is to copy each packet in a new buffer like I did in
> the v2 [2], but this forces us to make a copy for each packet that does
> not fill the entire buffer, perhaps too expensive.
> 
> [2] https://patchwork.kernel.org/patch/10938741/
> 
> 
> Thanks,
> Stefano

Interesting. You are right, and at some level the protocol forces copies.

We could try to detect that the actual memory is getting close to
admin limits and force copies on queued packets after the fact.
Is that practical?

And yes we can extend the credit accounting to include buffer size.
That's a protocol change but maybe it makes sense.

-- 
MST

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-07-29 15:36     ` Stefano Garzarella
  2019-07-29 15:49       ` Michael S. Tsirkin
@ 2019-07-29 16:01       ` Michael S. Tsirkin
  2019-07-29 16:41         ` Stefano Garzarella
  1 sibling, 1 reply; 69+ messages in thread
From: Michael S. Tsirkin @ 2019-07-29 16:01 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Mon, Jul 29, 2019 at 05:36:56PM +0200, Stefano Garzarella wrote:
> On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > Since virtio-vsock was introduced, the buffers filled by the host
> > > and pushed to the guest using the vring, are directly queued in
> > > a per-socket list. These buffers are preallocated by the guest
> > > with a fixed size (4 KB).
> > > 
> > > The maximum amount of memory used by each socket should be
> > > controlled by the credit mechanism.
> > > The default credit available per-socket is 256 KB, but if we use
> > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > guest will continue to fill the vring with new 4 KB free buffers
> > > to avoid starvation of other sockets.
> > > 
> > > This patch mitigates this issue copying the payload of small
> > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > order to avoid wasting memory.
> > > 
> > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > 
> > This is good enough for net-next, but for net I think we
> > should figure out how to address the issue completely.
> > Can we make the accounting precise? What happens to
> > performance if we do?
> > 
> 
> In order to do more precise accounting maybe we can use the buffer size,
> instead of payload size when we update the credit available.
> In this way, the credit available for each socket will reflect the memory
> actually used.
> 
> I should check better, because I'm not sure what happen if the peer sees
> 1KB of space available, then it sends 1KB of payload (using a 4KB
> buffer).
> The other option is to copy each packet in a new buffer like I did in
> the v2 [2], but this forces us to make a copy for each packet that does
> not fill the entire buffer, perhaps too expensive.
> 
> [2] https://patchwork.kernel.org/patch/10938741/
> 

So one thing we can easily do is to under-report the
available credit. E.g. if we copy up to 256bytes,
then report just 256bytes for every buffer in the queue.


> 
> Thanks,
> Stefano

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-07-29 15:49       ` Michael S. Tsirkin
@ 2019-07-29 16:19         ` Stefano Garzarella
  2019-07-29 16:50           ` Stefano Garzarella
  0 siblings, 1 reply; 69+ messages in thread
From: Stefano Garzarella @ 2019-07-29 16:19 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Mon, Jul 29, 2019 at 11:49:02AM -0400, Michael S. Tsirkin wrote:
> On Mon, Jul 29, 2019 at 05:36:56PM +0200, Stefano Garzarella wrote:
> > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > and pushed to the guest using the vring, are directly queued in
> > > > a per-socket list. These buffers are preallocated by the guest
> > > > with a fixed size (4 KB).
> > > > 
> > > > The maximum amount of memory used by each socket should be
> > > > controlled by the credit mechanism.
> > > > The default credit available per-socket is 256 KB, but if we use
> > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > to avoid starvation of other sockets.
> > > > 
> > > > This patch mitigates this issue copying the payload of small
> > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > order to avoid wasting memory.
> > > > 
> > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > 
> > > This is good enough for net-next, but for net I think we
> > > should figure out how to address the issue completely.
> > > Can we make the accounting precise? What happens to
> > > performance if we do?
> > > 
> > 
> > In order to do more precise accounting maybe we can use the buffer size,
> > instead of payload size when we update the credit available.
> > In this way, the credit available for each socket will reflect the memory
> > actually used.
> > 
> > I should check better, because I'm not sure what happen if the peer sees
> > 1KB of space available, then it sends 1KB of payload (using a 4KB
> > buffer).
> > 
> > The other option is to copy each packet in a new buffer like I did in
> > the v2 [2], but this forces us to make a copy for each packet that does
> > not fill the entire buffer, perhaps too expensive.
> > 
> > [2] https://patchwork.kernel.org/patch/10938741/
> > 
> > 
> > Thanks,
> > Stefano
> 
> Interesting. You are right, and at some level the protocol forces copies.
> 
> We could try to detect that the actual memory is getting close to
> admin limits and force copies on queued packets after the fact.
> Is that practical?

Yes, I think it is doable!
We can decrease the credit available with the buffer size queued, and
when the buffer size of packet to queue is bigger than the credit
available, we can copy it.

> 
> And yes we can extend the credit accounting to include buffer size.
> That's a protocol change but maybe it makes sense.

Since we send to the other peer the credit available, maybe this
change can be backwards compatible (I'll check better this).

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-07-29 16:01       ` Michael S. Tsirkin
@ 2019-07-29 16:41         ` Stefano Garzarella
  2019-07-29 19:33           ` Michael S. Tsirkin
  0 siblings, 1 reply; 69+ messages in thread
From: Stefano Garzarella @ 2019-07-29 16:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Mon, Jul 29, 2019 at 12:01:37PM -0400, Michael S. Tsirkin wrote:
> On Mon, Jul 29, 2019 at 05:36:56PM +0200, Stefano Garzarella wrote:
> > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > and pushed to the guest using the vring, are directly queued in
> > > > a per-socket list. These buffers are preallocated by the guest
> > > > with a fixed size (4 KB).
> > > >
> > > > The maximum amount of memory used by each socket should be
> > > > controlled by the credit mechanism.
> > > > The default credit available per-socket is 256 KB, but if we use
> > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > to avoid starvation of other sockets.
> > > >
> > > > This patch mitigates this issue copying the payload of small
> > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > order to avoid wasting memory.
> > > >
> > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > >
> > > This is good enough for net-next, but for net I think we
> > > should figure out how to address the issue completely.
> > > Can we make the accounting precise? What happens to
> > > performance if we do?
> > >
> >
> > In order to do more precise accounting maybe we can use the buffer size,
> > instead of payload size when we update the credit available.
> > In this way, the credit available for each socket will reflect the memory
> > actually used.
> >
> > I should check better, because I'm not sure what happen if the peer sees
> > 1KB of space available, then it sends 1KB of payload (using a 4KB
> > buffer).
> > The other option is to copy each packet in a new buffer like I did in
> > the v2 [2], but this forces us to make a copy for each packet that does
> > not fill the entire buffer, perhaps too expensive.
> >
> > [2] https://patchwork.kernel.org/patch/10938741/
> >
>
> So one thing we can easily do is to under-report the
> available credit. E.g. if we copy up to 256bytes,
> then report just 256bytes for every buffer in the queue.
>

Ehm sorry, I got lost :(
Can you explain better?


Thanks,
Stefano

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-07-29 16:19         ` Stefano Garzarella
@ 2019-07-29 16:50           ` Stefano Garzarella
  2019-07-29 19:10             ` Michael S. Tsirkin
  0 siblings, 1 reply; 69+ messages in thread
From: Stefano Garzarella @ 2019-07-29 16:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Mon, Jul 29, 2019 at 06:19:03PM +0200, Stefano Garzarella wrote:
> On Mon, Jul 29, 2019 at 11:49:02AM -0400, Michael S. Tsirkin wrote:
> > On Mon, Jul 29, 2019 at 05:36:56PM +0200, Stefano Garzarella wrote:
> > > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > > and pushed to the guest using the vring, are directly queued in
> > > > > a per-socket list. These buffers are preallocated by the guest
> > > > > with a fixed size (4 KB).
> > > > > 
> > > > > The maximum amount of memory used by each socket should be
> > > > > controlled by the credit mechanism.
> > > > > The default credit available per-socket is 256 KB, but if we use
> > > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > > to avoid starvation of other sockets.
> > > > > 
> > > > > This patch mitigates this issue copying the payload of small
> > > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > > order to avoid wasting memory.
> > > > > 
> > > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > > 
> > > > This is good enough for net-next, but for net I think we
> > > > should figure out how to address the issue completely.
> > > > Can we make the accounting precise? What happens to
> > > > performance if we do?
> > > > 
> > > 
> > > In order to do more precise accounting maybe we can use the buffer size,
> > > instead of payload size when we update the credit available.
> > > In this way, the credit available for each socket will reflect the memory
> > > actually used.
> > > 
> > > I should check better, because I'm not sure what happen if the peer sees
> > > 1KB of space available, then it sends 1KB of payload (using a 4KB
> > > buffer).
> > > 
> > > The other option is to copy each packet in a new buffer like I did in
> > > the v2 [2], but this forces us to make a copy for each packet that does
> > > not fill the entire buffer, perhaps too expensive.
> > > 
> > > [2] https://patchwork.kernel.org/patch/10938741/
> > > 
> > > 
> > > Thanks,
> > > Stefano
> > 
> > Interesting. You are right, and at some level the protocol forces copies.
> > 
> > We could try to detect that the actual memory is getting close to
> > admin limits and force copies on queued packets after the fact.
> > Is that practical?
> 
> Yes, I think it is doable!
> We can decrease the credit available with the buffer size queued, and
> when the buffer size of packet to queue is bigger than the credit
> available, we can copy it.
> 
> > 
> > And yes we can extend the credit accounting to include buffer size.
> > That's a protocol change but maybe it makes sense.
> 
> Since we send to the other peer the credit available, maybe this
> change can be backwards compatible (I'll check better this).

What I said was wrong.

We send a counter (increased when the user consumes the packets) and the
"buf_alloc" (the max memory allowed) to the other peer.
It makes a difference between a local counter (increased when the
packets are sent) and the remote counter to calculate the credit available:

    u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 credit)
    {
    	u32 ret;

    	spin_lock_bh(&vvs->tx_lock);
    	ret = vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt);
    	if (ret > credit)
    		ret = credit;
    	vvs->tx_cnt += ret;
    	spin_unlock_bh(&vvs->tx_lock);

    	return ret;
    }

Maybe I can play with "buf_alloc" to take care of bytes queued but not
used.

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-07-29 16:50           ` Stefano Garzarella
@ 2019-07-29 19:10             ` Michael S. Tsirkin
  2019-07-30  9:35               ` Stefano Garzarella
  0 siblings, 1 reply; 69+ messages in thread
From: Michael S. Tsirkin @ 2019-07-29 19:10 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Mon, Jul 29, 2019 at 06:50:56PM +0200, Stefano Garzarella wrote:
> On Mon, Jul 29, 2019 at 06:19:03PM +0200, Stefano Garzarella wrote:
> > On Mon, Jul 29, 2019 at 11:49:02AM -0400, Michael S. Tsirkin wrote:
> > > On Mon, Jul 29, 2019 at 05:36:56PM +0200, Stefano Garzarella wrote:
> > > > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > > > and pushed to the guest using the vring, are directly queued in
> > > > > > a per-socket list. These buffers are preallocated by the guest
> > > > > > with a fixed size (4 KB).
> > > > > > 
> > > > > > The maximum amount of memory used by each socket should be
> > > > > > controlled by the credit mechanism.
> > > > > > The default credit available per-socket is 256 KB, but if we use
> > > > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > > > to avoid starvation of other sockets.
> > > > > > 
> > > > > > This patch mitigates this issue copying the payload of small
> > > > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > > > order to avoid wasting memory.
> > > > > > 
> > > > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > > > 
> > > > > This is good enough for net-next, but for net I think we
> > > > > should figure out how to address the issue completely.
> > > > > Can we make the accounting precise? What happens to
> > > > > performance if we do?
> > > > > 
> > > > 
> > > > In order to do more precise accounting maybe we can use the buffer size,
> > > > instead of payload size when we update the credit available.
> > > > In this way, the credit available for each socket will reflect the memory
> > > > actually used.
> > > > 
> > > > I should check better, because I'm not sure what happen if the peer sees
> > > > 1KB of space available, then it sends 1KB of payload (using a 4KB
> > > > buffer).
> > > > 
> > > > The other option is to copy each packet in a new buffer like I did in
> > > > the v2 [2], but this forces us to make a copy for each packet that does
> > > > not fill the entire buffer, perhaps too expensive.
> > > > 
> > > > [2] https://patchwork.kernel.org/patch/10938741/
> > > > 
> > > > 
> > > > Thanks,
> > > > Stefano
> > > 
> > > Interesting. You are right, and at some level the protocol forces copies.
> > > 
> > > We could try to detect that the actual memory is getting close to
> > > admin limits and force copies on queued packets after the fact.
> > > Is that practical?
> > 
> > Yes, I think it is doable!
> > We can decrease the credit available with the buffer size queued, and
> > when the buffer size of packet to queue is bigger than the credit
> > available, we can copy it.
> > 
> > > 
> > > And yes we can extend the credit accounting to include buffer size.
> > > That's a protocol change but maybe it makes sense.
> > 
> > Since we send to the other peer the credit available, maybe this
> > change can be backwards compatible (I'll check better this).
> 
> What I said was wrong.
> 
> We send a counter (increased when the user consumes the packets) and the
> "buf_alloc" (the max memory allowed) to the other peer.
> It makes a difference between a local counter (increased when the
> packets are sent) and the remote counter to calculate the credit available:
> 
>     u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 credit)
>     {
>     	u32 ret;
> 
>     	spin_lock_bh(&vvs->tx_lock);
>     	ret = vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt);
>     	if (ret > credit)
>     		ret = credit;
>     	vvs->tx_cnt += ret;
>     	spin_unlock_bh(&vvs->tx_lock);
> 
>     	return ret;
>     }
> 
> Maybe I can play with "buf_alloc" to take care of bytes queued but not
> used.
> 
> Thanks,
> Stefano

Right. And the idea behind it all was that if we send a credit
to remote then we have space for it.
I think the basic idea was that if we have actual allocated
memory and can copy data there, then we send the credit to
remote.

Of course that means an extra copy every packet.
So as an optimization, it seems that we just assume
that we will be able to allocate a new buffer.

First this is not the best we can do. We can actually do
allocate memory in the socket before sending credit.
If packet is small then we copy it there.
If packet is big then we queue the packet,
take the buffer out of socket and add it to the virtqueue.

Second question is what to do about medium sized packets.
Packet is 1K but buffer is 4K, what do we do?
And here I wonder - why don't we add the 3K buffer
to the vq?



-- 
MST

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-07-29 16:41         ` Stefano Garzarella
@ 2019-07-29 19:33           ` Michael S. Tsirkin
  0 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2019-07-29 19:33 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Mon, Jul 29, 2019 at 06:41:27PM +0200, Stefano Garzarella wrote:
> On Mon, Jul 29, 2019 at 12:01:37PM -0400, Michael S. Tsirkin wrote:
> > On Mon, Jul 29, 2019 at 05:36:56PM +0200, Stefano Garzarella wrote:
> > > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > > and pushed to the guest using the vring, are directly queued in
> > > > > a per-socket list. These buffers are preallocated by the guest
> > > > > with a fixed size (4 KB).
> > > > >
> > > > > The maximum amount of memory used by each socket should be
> > > > > controlled by the credit mechanism.
> > > > > The default credit available per-socket is 256 KB, but if we use
> > > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > > to avoid starvation of other sockets.
> > > > >
> > > > > This patch mitigates this issue copying the payload of small
> > > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > > order to avoid wasting memory.
> > > > >
> > > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > >
> > > > This is good enough for net-next, but for net I think we
> > > > should figure out how to address the issue completely.
> > > > Can we make the accounting precise? What happens to
> > > > performance if we do?
> > > >
> > >
> > > In order to do more precise accounting maybe we can use the buffer size,
> > > instead of payload size when we update the credit available.
> > > In this way, the credit available for each socket will reflect the memory
> > > actually used.
> > >
> > > I should check better, because I'm not sure what happen if the peer sees
> > > 1KB of space available, then it sends 1KB of payload (using a 4KB
> > > buffer).
> > > The other option is to copy each packet in a new buffer like I did in
> > > the v2 [2], but this forces us to make a copy for each packet that does
> > > not fill the entire buffer, perhaps too expensive.
> > >
> > > [2] https://patchwork.kernel.org/patch/10938741/
> > >
> >
> > So one thing we can easily do is to under-report the
> > available credit. E.g. if we copy up to 256bytes,
> > then report just 256bytes for every buffer in the queue.
> >
> 
> Ehm sorry, I got lost :(
> Can you explain better?
> 
> 
> Thanks,
> Stefano

I think I suggested a better idea more recently.
But to clarify this option: we are adding a 4K buffer.
Let's say we know we will always copy 128 bytes.

So we just tell remote we have 128.
If we add another 4K buffer we add another 128 credits.

So we are charging local socket 16x more (4k for a 128 byte packet) but
we are paying remote 16x less (128 credits for 4k byte buffer). It evens
out.

Way less credits to go around so I'm not sure it's a good idea,
at least as the only solution. Can be combined with other
optimizations and probably in a less drastic fashion
(e.g. 2x rather than 16x).


-- 
MST

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-07-29 19:10             ` Michael S. Tsirkin
@ 2019-07-30  9:35               ` Stefano Garzarella
  2019-07-30 20:42                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 69+ messages in thread
From: Stefano Garzarella @ 2019-07-30  9:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Mon, Jul 29, 2019 at 03:10:15PM -0400, Michael S. Tsirkin wrote:
> On Mon, Jul 29, 2019 at 06:50:56PM +0200, Stefano Garzarella wrote:
> > On Mon, Jul 29, 2019 at 06:19:03PM +0200, Stefano Garzarella wrote:
> > > On Mon, Jul 29, 2019 at 11:49:02AM -0400, Michael S. Tsirkin wrote:
> > > > On Mon, Jul 29, 2019 at 05:36:56PM +0200, Stefano Garzarella wrote:
> > > > > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > > > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > > > > and pushed to the guest using the vring, are directly queued in
> > > > > > > a per-socket list. These buffers are preallocated by the guest
> > > > > > > with a fixed size (4 KB).
> > > > > > > 
> > > > > > > The maximum amount of memory used by each socket should be
> > > > > > > controlled by the credit mechanism.
> > > > > > > The default credit available per-socket is 256 KB, but if we use
> > > > > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > > > > to avoid starvation of other sockets.
> > > > > > > 
> > > > > > > This patch mitigates this issue copying the payload of small
> > > > > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > > > > order to avoid wasting memory.
> > > > > > > 
> > > > > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > > > > 
> > > > > > This is good enough for net-next, but for net I think we
> > > > > > should figure out how to address the issue completely.
> > > > > > Can we make the accounting precise? What happens to
> > > > > > performance if we do?
> > > > > > 
> > > > > 
> > > > > In order to do more precise accounting maybe we can use the buffer size,
> > > > > instead of payload size when we update the credit available.
> > > > > In this way, the credit available for each socket will reflect the memory
> > > > > actually used.
> > > > > 
> > > > > I should check better, because I'm not sure what happen if the peer sees
> > > > > 1KB of space available, then it sends 1KB of payload (using a 4KB
> > > > > buffer).
> > > > > 
> > > > > The other option is to copy each packet in a new buffer like I did in
> > > > > the v2 [2], but this forces us to make a copy for each packet that does
> > > > > not fill the entire buffer, perhaps too expensive.
> > > > > 
> > > > > [2] https://patchwork.kernel.org/patch/10938741/
> > > > > 
> > > > > 
> > > > > Thanks,
> > > > > Stefano
> > > > 
> > > > Interesting. You are right, and at some level the protocol forces copies.
> > > > 
> > > > We could try to detect that the actual memory is getting close to
> > > > admin limits and force copies on queued packets after the fact.
> > > > Is that practical?
> > > 
> > > Yes, I think it is doable!
> > > We can decrease the credit available with the buffer size queued, and
> > > when the buffer size of packet to queue is bigger than the credit
> > > available, we can copy it.
> > > 
> > > > 
> > > > And yes we can extend the credit accounting to include buffer size.
> > > > That's a protocol change but maybe it makes sense.
> > > 
> > > Since we send to the other peer the credit available, maybe this
> > > change can be backwards compatible (I'll check better this).
> > 
> > What I said was wrong.
> > 
> > We send a counter (increased when the user consumes the packets) and the
> > "buf_alloc" (the max memory allowed) to the other peer.
> > It makes a difference between a local counter (increased when the
> > packets are sent) and the remote counter to calculate the credit available:
> > 
> >     u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 credit)
> >     {
> >     	u32 ret;
> > 
> >     	spin_lock_bh(&vvs->tx_lock);
> >     	ret = vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt);
> >     	if (ret > credit)
> >     		ret = credit;
> >     	vvs->tx_cnt += ret;
> >     	spin_unlock_bh(&vvs->tx_lock);
> > 
> >     	return ret;
> >     }
> > 
> > Maybe I can play with "buf_alloc" to take care of bytes queued but not
> > used.
> > 
> > Thanks,
> > Stefano
> 
> Right. And the idea behind it all was that if we send a credit
> to remote then we have space for it.

Yes.

> I think the basic idea was that if we have actual allocated
> memory and can copy data there, then we send the credit to
> remote.
> 
> Of course that means an extra copy every packet.
> So as an optimization, it seems that we just assume
> that we will be able to allocate a new buffer.

Yes, we refill the virtqueue when half of the buffers were used.

> 
> First this is not the best we can do. We can actually do
> allocate memory in the socket before sending credit.

In this case, IIUC we should allocate an entire buffer (4KB),
so we can reuse it if the packet is big.

> If packet is small then we copy it there.
> If packet is big then we queue the packet,
> take the buffer out of socket and add it to the virtqueue.
> 
> Second question is what to do about medium sized packets.
> Packet is 1K but buffer is 4K, what do we do?
> And here I wonder - why don't we add the 3K buffer
> to the vq?

This would allow us to have an accurate credit account.

The problem here is the compatibility. Before this series virtio-vsock
and vhost-vsock modules had the RX buffer size hard-coded
(VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE = 4K). So, if we send a buffer smaller
of 4K, there might be issues.

Maybe it is the time to add add 'features' to virtio-vsock device.

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput
  2019-07-29 13:59 ` Michael S. Tsirkin
@ 2019-07-30  9:40   ` Stefano Garzarella
  2019-07-30 10:03     ` Jason Wang
  0 siblings, 1 reply; 69+ messages in thread
From: Stefano Garzarella @ 2019-07-30  9:40 UTC (permalink / raw)
  To: Michael S. Tsirkin, Stefan Hajnoczi, David S. Miller
  Cc: netdev, linux-kernel, virtualization, Jason Wang, kvm

On Mon, Jul 29, 2019 at 09:59:23AM -0400, Michael S. Tsirkin wrote:
> On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:
> > This series tries to increase the throughput of virtio-vsock with slight
> > changes.
> > While I was testing the v2 of this series I discovered an huge use of memory,
> > so I added patch 1 to mitigate this issue. I put it in this series in order
> > to better track the performance trends.
> 
> Series:
> 
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> 
> Can this go into net-next?
> 

I think so.
Michael, Stefan thanks to ack the series!

Should I resend it with net-next tag?

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput
  2019-07-30  9:40   ` Stefano Garzarella
@ 2019-07-30 10:03     ` Jason Wang
  2019-07-30 15:38       ` Stefano Garzarella
  0 siblings, 1 reply; 69+ messages in thread
From: Jason Wang @ 2019-07-30 10:03 UTC (permalink / raw)
  To: Stefano Garzarella, Michael S. Tsirkin, Stefan Hajnoczi, David S. Miller
  Cc: netdev, linux-kernel, virtualization, kvm


On 2019/7/30 下午5:40, Stefano Garzarella wrote:
> On Mon, Jul 29, 2019 at 09:59:23AM -0400, Michael S. Tsirkin wrote:
>> On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:
>>> This series tries to increase the throughput of virtio-vsock with slight
>>> changes.
>>> While I was testing the v2 of this series I discovered an huge use of memory,
>>> so I added patch 1 to mitigate this issue. I put it in this series in order
>>> to better track the performance trends.
>> Series:
>>
>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>>
>> Can this go into net-next?
>>
> I think so.
> Michael, Stefan thanks to ack the series!
>
> Should I resend it with net-next tag?
>
> Thanks,
> Stefano


I think so.

Thanks


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput
  2019-07-30 10:03     ` Jason Wang
@ 2019-07-30 15:38       ` Stefano Garzarella
  0 siblings, 0 replies; 69+ messages in thread
From: Stefano Garzarella @ 2019-07-30 15:38 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Stefan Hajnoczi, David S. Miller, netdev,
	linux-kernel, virtualization, kvm

On Tue, Jul 30, 2019 at 06:03:24PM +0800, Jason Wang wrote:
> 
> On 2019/7/30 下午5:40, Stefano Garzarella wrote:
> > On Mon, Jul 29, 2019 at 09:59:23AM -0400, Michael S. Tsirkin wrote:
> > > On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:
> > > > This series tries to increase the throughput of virtio-vsock with slight
> > > > changes.
> > > > While I was testing the v2 of this series I discovered an huge use of memory,
> > > > so I added patch 1 to mitigate this issue. I put it in this series in order
> > > > to better track the performance trends.
> > > Series:
> > > 
> > > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > > 
> > > Can this go into net-next?
> > > 
> > I think so.
> > Michael, Stefan thanks to ack the series!
> > 
> > Should I resend it with net-next tag?
> > 
> > Thanks,
> > Stefano
> 
> 
> I think so.

Okay, I'll resend it!

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-07-30  9:35               ` Stefano Garzarella
@ 2019-07-30 20:42                 ` Michael S. Tsirkin
  2019-08-01 10:47                   ` Stefano Garzarella
  0 siblings, 1 reply; 69+ messages in thread
From: Michael S. Tsirkin @ 2019-07-30 20:42 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Tue, Jul 30, 2019 at 11:35:39AM +0200, Stefano Garzarella wrote:
> On Mon, Jul 29, 2019 at 03:10:15PM -0400, Michael S. Tsirkin wrote:
> > On Mon, Jul 29, 2019 at 06:50:56PM +0200, Stefano Garzarella wrote:
> > > On Mon, Jul 29, 2019 at 06:19:03PM +0200, Stefano Garzarella wrote:
> > > > On Mon, Jul 29, 2019 at 11:49:02AM -0400, Michael S. Tsirkin wrote:
> > > > > On Mon, Jul 29, 2019 at 05:36:56PM +0200, Stefano Garzarella wrote:
> > > > > > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > > > > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > > > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > > > > > and pushed to the guest using the vring, are directly queued in
> > > > > > > > a per-socket list. These buffers are preallocated by the guest
> > > > > > > > with a fixed size (4 KB).
> > > > > > > > 
> > > > > > > > The maximum amount of memory used by each socket should be
> > > > > > > > controlled by the credit mechanism.
> > > > > > > > The default credit available per-socket is 256 KB, but if we use
> > > > > > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > > > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > > > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > > > > > to avoid starvation of other sockets.
> > > > > > > > 
> > > > > > > > This patch mitigates this issue copying the payload of small
> > > > > > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > > > > > order to avoid wasting memory.
> > > > > > > > 
> > > > > > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > > > > > 
> > > > > > > This is good enough for net-next, but for net I think we
> > > > > > > should figure out how to address the issue completely.
> > > > > > > Can we make the accounting precise? What happens to
> > > > > > > performance if we do?
> > > > > > > 
> > > > > > 
> > > > > > In order to do more precise accounting maybe we can use the buffer size,
> > > > > > instead of payload size when we update the credit available.
> > > > > > In this way, the credit available for each socket will reflect the memory
> > > > > > actually used.
> > > > > > 
> > > > > > I should check better, because I'm not sure what happen if the peer sees
> > > > > > 1KB of space available, then it sends 1KB of payload (using a 4KB
> > > > > > buffer).
> > > > > > 
> > > > > > The other option is to copy each packet in a new buffer like I did in
> > > > > > the v2 [2], but this forces us to make a copy for each packet that does
> > > > > > not fill the entire buffer, perhaps too expensive.
> > > > > > 
> > > > > > [2] https://patchwork.kernel.org/patch/10938741/
> > > > > > 
> > > > > > 
> > > > > > Thanks,
> > > > > > Stefano
> > > > > 
> > > > > Interesting. You are right, and at some level the protocol forces copies.
> > > > > 
> > > > > We could try to detect that the actual memory is getting close to
> > > > > admin limits and force copies on queued packets after the fact.
> > > > > Is that practical?
> > > > 
> > > > Yes, I think it is doable!
> > > > We can decrease the credit available with the buffer size queued, and
> > > > when the buffer size of packet to queue is bigger than the credit
> > > > available, we can copy it.
> > > > 
> > > > > 
> > > > > And yes we can extend the credit accounting to include buffer size.
> > > > > That's a protocol change but maybe it makes sense.
> > > > 
> > > > Since we send to the other peer the credit available, maybe this
> > > > change can be backwards compatible (I'll check better this).
> > > 
> > > What I said was wrong.
> > > 
> > > We send a counter (increased when the user consumes the packets) and the
> > > "buf_alloc" (the max memory allowed) to the other peer.
> > > It makes a difference between a local counter (increased when the
> > > packets are sent) and the remote counter to calculate the credit available:
> > > 
> > >     u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 credit)
> > >     {
> > >     	u32 ret;
> > > 
> > >     	spin_lock_bh(&vvs->tx_lock);
> > >     	ret = vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt);
> > >     	if (ret > credit)
> > >     		ret = credit;
> > >     	vvs->tx_cnt += ret;
> > >     	spin_unlock_bh(&vvs->tx_lock);
> > > 
> > >     	return ret;
> > >     }
> > > 
> > > Maybe I can play with "buf_alloc" to take care of bytes queued but not
> > > used.
> > > 
> > > Thanks,
> > > Stefano
> > 
> > Right. And the idea behind it all was that if we send a credit
> > to remote then we have space for it.
> 
> Yes.
> 
> > I think the basic idea was that if we have actual allocated
> > memory and can copy data there, then we send the credit to
> > remote.
> > 
> > Of course that means an extra copy every packet.
> > So as an optimization, it seems that we just assume
> > that we will be able to allocate a new buffer.
> 
> Yes, we refill the virtqueue when half of the buffers were used.
> 
> > 
> > First this is not the best we can do. We can actually do
> > allocate memory in the socket before sending credit.
> 
> In this case, IIUC we should allocate an entire buffer (4KB),
> so we can reuse it if the packet is big.
> 
> > If packet is small then we copy it there.
> > If packet is big then we queue the packet,
> > take the buffer out of socket and add it to the virtqueue.
> > 
> > Second question is what to do about medium sized packets.
> > Packet is 1K but buffer is 4K, what do we do?
> > And here I wonder - why don't we add the 3K buffer
> > to the vq?
> 
> This would allow us to have an accurate credit account.
> 
> The problem here is the compatibility. Before this series virtio-vsock
> and vhost-vsock modules had the RX buffer size hard-coded
> (VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE = 4K). So, if we send a buffer smaller
> of 4K, there might be issues.

Shouldn't be if they are following the spec. If not let's fix
the broken parts.

> 
> Maybe it is the time to add add 'features' to virtio-vsock device.
> 
> Thanks,
> Stefano

Why would a remote care about buffer sizes?

Let's first see what the issues are. If they exist
we can either fix the bugs, or code the bug as a feature in spec.

-- 
MST

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-07-30 20:42                 ` Michael S. Tsirkin
@ 2019-08-01 10:47                   ` Stefano Garzarella
  2019-08-01 13:21                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 69+ messages in thread
From: Stefano Garzarella @ 2019-08-01 10:47 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Tue, Jul 30, 2019 at 04:42:25PM -0400, Michael S. Tsirkin wrote:
> On Tue, Jul 30, 2019 at 11:35:39AM +0200, Stefano Garzarella wrote:

(...)

> > 
> > The problem here is the compatibility. Before this series virtio-vsock
> > and vhost-vsock modules had the RX buffer size hard-coded
> > (VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE = 4K). So, if we send a buffer smaller
> > of 4K, there might be issues.
> 
> Shouldn't be if they are following the spec. If not let's fix
> the broken parts.
> 
> > 
> > Maybe it is the time to add add 'features' to virtio-vsock device.
> > 
> > Thanks,
> > Stefano
> 
> Why would a remote care about buffer sizes?
> 
> Let's first see what the issues are. If they exist
> we can either fix the bugs, or code the bug as a feature in spec.
> 

The vhost_transport '.stream_enqueue' callback
[virtio_transport_stream_enqueue()] calls the virtio_transport_send_pkt_info(),
passing the user message. This function allocates a new packet, copying
the user message, but (before this series) it limits the packet size to
the VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (4K):

static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
					  struct virtio_vsock_pkt_info *info)
{
 ...
	/* we can send less than pkt_len bytes */
	if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
		pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;

	/* virtio_transport_get_credit might return less than pkt_len credit */
	pkt_len = virtio_transport_get_credit(vvs, pkt_len);

	/* Do not send zero length OP_RW pkt */
	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
		return pkt_len;
 ...
}

then it queues the packet for the TX worker calling .send_pkt()
[vhost_transport_send_pkt() in the vhost_transport case]

The main function executed by the TX worker is
vhost_transport_do_send_pkt() that picks up a buffer from the virtqueue
and it tries to copy the packet (up to 4K) on it.  If the buffer
allocated from the guest will be smaller then 4K, I think here it will
be discarded with an error:

static void
vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
				struct vhost_virtqueue *vq)
{
 ...
		nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);
		if (nbytes != pkt->len) {
			virtio_transport_free_pkt(pkt);
			vq_err(vq, "Faulted on copying pkt buf\n");
			break;
		}
 ...
}


This series changes this behavior since now we will split the packet in
vhost_transport_do_send_pkt() depending on the buffer found in the
virtqueue.

We didn't change the buffer size in this series, so we still backward
compatible, but if we will use buffers smaller than 4K, we should
encounter the error described above.

How do you suggest we proceed if we want to change the buffer size?
Maybe adding a feature to "support any buffer size"?

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-08-01 10:47                   ` Stefano Garzarella
@ 2019-08-01 13:21                     ` Michael S. Tsirkin
  2019-08-01 13:36                       ` Stefano Garzarella
  0 siblings, 1 reply; 69+ messages in thread
From: Michael S. Tsirkin @ 2019-08-01 13:21 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Thu, Aug 01, 2019 at 12:47:54PM +0200, Stefano Garzarella wrote:
> On Tue, Jul 30, 2019 at 04:42:25PM -0400, Michael S. Tsirkin wrote:
> > On Tue, Jul 30, 2019 at 11:35:39AM +0200, Stefano Garzarella wrote:
> 
> (...)
> 
> > > 
> > > The problem here is the compatibility. Before this series virtio-vsock
> > > and vhost-vsock modules had the RX buffer size hard-coded
> > > (VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE = 4K). So, if we send a buffer smaller
> > > of 4K, there might be issues.
> > 
> > Shouldn't be if they are following the spec. If not let's fix
> > the broken parts.
> > 
> > > 
> > > Maybe it is the time to add add 'features' to virtio-vsock device.
> > > 
> > > Thanks,
> > > Stefano
> > 
> > Why would a remote care about buffer sizes?
> > 
> > Let's first see what the issues are. If they exist
> > we can either fix the bugs, or code the bug as a feature in spec.
> > 
> 
> The vhost_transport '.stream_enqueue' callback
> [virtio_transport_stream_enqueue()] calls the virtio_transport_send_pkt_info(),
> passing the user message. This function allocates a new packet, copying
> the user message, but (before this series) it limits the packet size to
> the VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (4K):
> 
> static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> 					  struct virtio_vsock_pkt_info *info)
> {
>  ...
> 	/* we can send less than pkt_len bytes */
> 	if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
> 		pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> 
> 	/* virtio_transport_get_credit might return less than pkt_len credit */
> 	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
> 
> 	/* Do not send zero length OP_RW pkt */
> 	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> 		return pkt_len;
>  ...
> }
> 
> then it queues the packet for the TX worker calling .send_pkt()
> [vhost_transport_send_pkt() in the vhost_transport case]
> 
> The main function executed by the TX worker is
> vhost_transport_do_send_pkt() that picks up a buffer from the virtqueue
> and it tries to copy the packet (up to 4K) on it.  If the buffer
> allocated from the guest will be smaller then 4K, I think here it will
> be discarded with an error:
> 
> static void
> vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> 				struct vhost_virtqueue *vq)
> {
>  ...
> 		nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);

isn't pck len the actual length though?

> 		if (nbytes != pkt->len) {
> 			virtio_transport_free_pkt(pkt);
> 			vq_err(vq, "Faulted on copying pkt buf\n");
> 			break;
> 		}
>  ...
> }
> 
> 
> This series changes this behavior since now we will split the packet in
> vhost_transport_do_send_pkt() depending on the buffer found in the
> virtqueue.
> 
> We didn't change the buffer size in this series, so we still backward
> compatible, but if we will use buffers smaller than 4K, we should
> encounter the error described above.
> 
> How do you suggest we proceed if we want to change the buffer size?
> Maybe adding a feature to "support any buffer size"?
> 
> Thanks,
> Stefano



^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-08-01 13:21                     ` Michael S. Tsirkin
@ 2019-08-01 13:36                       ` Stefano Garzarella
  2019-09-01  8:26                         ` Michael S. Tsirkin
  0 siblings, 1 reply; 69+ messages in thread
From: Stefano Garzarella @ 2019-08-01 13:36 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Thu, Aug 01, 2019 at 09:21:15AM -0400, Michael S. Tsirkin wrote:
> On Thu, Aug 01, 2019 at 12:47:54PM +0200, Stefano Garzarella wrote:
> > On Tue, Jul 30, 2019 at 04:42:25PM -0400, Michael S. Tsirkin wrote:
> > > On Tue, Jul 30, 2019 at 11:35:39AM +0200, Stefano Garzarella wrote:
> > 
> > (...)
> > 
> > > > 
> > > > The problem here is the compatibility. Before this series virtio-vsock
> > > > and vhost-vsock modules had the RX buffer size hard-coded
> > > > (VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE = 4K). So, if we send a buffer smaller
> > > > of 4K, there might be issues.
> > > 
> > > Shouldn't be if they are following the spec. If not let's fix
> > > the broken parts.
> > > 
> > > > 
> > > > Maybe it is the time to add add 'features' to virtio-vsock device.
> > > > 
> > > > Thanks,
> > > > Stefano
> > > 
> > > Why would a remote care about buffer sizes?
> > > 
> > > Let's first see what the issues are. If they exist
> > > we can either fix the bugs, or code the bug as a feature in spec.
> > > 
> > 
> > The vhost_transport '.stream_enqueue' callback
> > [virtio_transport_stream_enqueue()] calls the virtio_transport_send_pkt_info(),
> > passing the user message. This function allocates a new packet, copying
> > the user message, but (before this series) it limits the packet size to
> > the VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (4K):
> > 
> > static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > 					  struct virtio_vsock_pkt_info *info)
> > {
> >  ...
> > 	/* we can send less than pkt_len bytes */
> > 	if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
> > 		pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> > 
> > 	/* virtio_transport_get_credit might return less than pkt_len credit */
> > 	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
> > 
> > 	/* Do not send zero length OP_RW pkt */
> > 	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> > 		return pkt_len;
> >  ...
> > }
> > 
> > then it queues the packet for the TX worker calling .send_pkt()
> > [vhost_transport_send_pkt() in the vhost_transport case]
> > 
> > The main function executed by the TX worker is
> > vhost_transport_do_send_pkt() that picks up a buffer from the virtqueue
> > and it tries to copy the packet (up to 4K) on it.  If the buffer
> > allocated from the guest will be smaller then 4K, I think here it will
> > be discarded with an error:
> > 

I'm adding more lines to explain better.

> > static void
> > vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> > 				struct vhost_virtqueue *vq)
> > {
		...

		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
					 &out, &in, NULL, NULL);

		...

		len = iov_length(&vq->iov[out], in);
		iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);

		nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
		if (nbytes != sizeof(pkt->hdr)) {
			virtio_transport_free_pkt(pkt);
			vq_err(vq, "Faulted on copying pkt hdr\n");
			break;
		}

> >  ...
> > 		nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);
> 
> isn't pck len the actual length though?
> 

It is the length of the packet that we are copying in the guest RX
buffers pointed by the iov_iter. The guest allocates an iovec with 2
buffers, one for the header and one for the payload (4KB).

> > 		if (nbytes != pkt->len) {
> > 			virtio_transport_free_pkt(pkt);
> > 			vq_err(vq, "Faulted on copying pkt buf\n");
> > 			break;
> > 		}
> >  ...
> > }
> > 
> > 
> > This series changes this behavior since now we will split the packet in
> > vhost_transport_do_send_pkt() depending on the buffer found in the
> > virtqueue.
> > 
> > We didn't change the buffer size in this series, so we still backward
> > compatible, but if we will use buffers smaller than 4K, we should
> > encounter the error described above.
> > 
> > How do you suggest we proceed if we want to change the buffer size?
> > Maybe adding a feature to "support any buffer size"?
> > 
> > Thanks,
> > Stefano
> 
> 

-- 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-07-29 14:04   ` Michael S. Tsirkin
  2019-07-29 15:36     ` Stefano Garzarella
@ 2019-08-30  9:40     ` Stefano Garzarella
  2019-09-01  6:56       ` Michael S. Tsirkin
  1 sibling, 1 reply; 69+ messages in thread
From: Stefano Garzarella @ 2019-08-30  9:40 UTC (permalink / raw)
  To: Michael S. Tsirkin, Stefan Hajnoczi
  Cc: netdev, linux-kernel, David S. Miller, virtualization, Jason Wang, kvm

On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > Since virtio-vsock was introduced, the buffers filled by the host
> > and pushed to the guest using the vring, are directly queued in
> > a per-socket list. These buffers are preallocated by the guest
> > with a fixed size (4 KB).
> > 
> > The maximum amount of memory used by each socket should be
> > controlled by the credit mechanism.
> > The default credit available per-socket is 256 KB, but if we use
> > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > buffers, using up to 1 GB of memory per-socket. In addition, the
> > guest will continue to fill the vring with new 4 KB free buffers
> > to avoid starvation of other sockets.
> > 
> > This patch mitigates this issue copying the payload of small
> > packets (< 128 bytes) into the buffer of last packet queued, in
> > order to avoid wasting memory.
> > 
> > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> 
> This is good enough for net-next, but for net I think we
> should figure out how to address the issue completely.
> Can we make the accounting precise? What happens to
> performance if we do?
> 

Since I'm back from holidays, I'm restarting this thread to figure out
how to address the issue completely.

I did a better analysis of the credit mechanism that we implemented in
virtio-vsock to get a clearer view and I'd share it with you:

    This issue affect only the "host->guest" path. In this case, when the
    host wants to send a packet to the guest, it uses a "free" buffer
    allocated by the guest (4KB).
    The "free" buffers available for the host are shared between all
    sockets, instead, the credit mechanism is per-socket, I think to
    avoid the starvation of others sockets.
    The guests re-fill the "free" queue when the available buffers are
    less than half.

    Each peer have these variables in the per-socket state:
       /* local vars */
       buf_alloc        /* max bytes usable by this socket
                           [exposed to the other peer] */
       fwd_cnt          /* increased when RX packet is consumed by the
                           user space [exposed to the other peer] */
       tx_cnt 	        /* increased when TX packet is sent to the other peer */

       /* remote vars  */
       peer_buf_alloc   /* peer's buf_alloc */
       peer_fwd_cnt     /* peer's fwd_cnt */

    When a peer sends a packet, it increases the 'tx_cnt'; when the
    receiver consumes the packet (copy it to the user-space buffer), it
    increases the 'fwd_cnt'.
    Note: increments are made considering the payload length and not the
    buffer length.

    The value of 'buf_alloc' and 'fwd_cnt' are sent to the other peer in
    all packet headers or with an explicit CREDIT_UPDATE packet.

    The local 'buf_alloc' value can be modified by the user space using
    setsockopt() with optname=SO_VM_SOCKETS_BUFFER_SIZE.

    Before to send a packet, the peer checks the space available:
    	credit_available = peer_buf_alloc - (tx_cnt - peer_fwd_cnt)
    and it will send up to credit_available bytes to the other peer.

Possible solutions considering Michael's advice:
1. Use the buffer length instead of the payload length when we increment
   the counters:
  - This approach will account precisely the memory used per socket.
  - This requires changes in both guest and host.
  - It is not compatible with old drivers, so a feature should be negotiated.

2. Decrease the advertised 'buf_alloc' taking count of bytes queued in
   the socket queue but not used. (e.g. 256 byte used on 4K available in
   the buffer)
  - pkt->hdr.buf_alloc = buf_alloc - bytes_not_used.
  - This should be compatible also with old drivers.

Maybe the second is less invasive, but will it be too tricky?
Any other advice or suggestions?

Thanks in advance,
Stefano

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-08-30  9:40     ` Stefano Garzarella
@ 2019-09-01  6:56       ` Michael S. Tsirkin
  2019-09-02  8:39         ` Stefan Hajnoczi
  2019-10-11 13:40         ` Stefano Garzarella
  0 siblings, 2 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2019-09-01  6:56 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Stefan Hajnoczi, netdev, linux-kernel, David S. Miller,
	virtualization, Jason Wang, kvm

On Fri, Aug 30, 2019 at 11:40:59AM +0200, Stefano Garzarella wrote:
> On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > Since virtio-vsock was introduced, the buffers filled by the host
> > > and pushed to the guest using the vring, are directly queued in
> > > a per-socket list. These buffers are preallocated by the guest
> > > with a fixed size (4 KB).
> > > 
> > > The maximum amount of memory used by each socket should be
> > > controlled by the credit mechanism.
> > > The default credit available per-socket is 256 KB, but if we use
> > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > guest will continue to fill the vring with new 4 KB free buffers
> > > to avoid starvation of other sockets.
> > > 
> > > This patch mitigates this issue copying the payload of small
> > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > order to avoid wasting memory.
> > > 
> > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > 
> > This is good enough for net-next, but for net I think we
> > should figure out how to address the issue completely.
> > Can we make the accounting precise? What happens to
> > performance if we do?
> > 
> 
> Since I'm back from holidays, I'm restarting this thread to figure out
> how to address the issue completely.
> 
> I did a better analysis of the credit mechanism that we implemented in
> virtio-vsock to get a clearer view and I'd share it with you:
> 
>     This issue affect only the "host->guest" path. In this case, when the
>     host wants to send a packet to the guest, it uses a "free" buffer
>     allocated by the guest (4KB).
>     The "free" buffers available for the host are shared between all
>     sockets, instead, the credit mechanism is per-socket, I think to
>     avoid the starvation of others sockets.
>     The guests re-fill the "free" queue when the available buffers are
>     less than half.
> 
>     Each peer have these variables in the per-socket state:
>        /* local vars */
>        buf_alloc        /* max bytes usable by this socket
>                            [exposed to the other peer] */
>        fwd_cnt          /* increased when RX packet is consumed by the
>                            user space [exposed to the other peer] */
>        tx_cnt 	        /* increased when TX packet is sent to the other peer */
> 
>        /* remote vars  */
>        peer_buf_alloc   /* peer's buf_alloc */
>        peer_fwd_cnt     /* peer's fwd_cnt */
> 
>     When a peer sends a packet, it increases the 'tx_cnt'; when the
>     receiver consumes the packet (copy it to the user-space buffer), it
>     increases the 'fwd_cnt'.
>     Note: increments are made considering the payload length and not the
>     buffer length.
> 
>     The value of 'buf_alloc' and 'fwd_cnt' are sent to the other peer in
>     all packet headers or with an explicit CREDIT_UPDATE packet.
> 
>     The local 'buf_alloc' value can be modified by the user space using
>     setsockopt() with optname=SO_VM_SOCKETS_BUFFER_SIZE.
> 
>     Before to send a packet, the peer checks the space available:
>     	credit_available = peer_buf_alloc - (tx_cnt - peer_fwd_cnt)
>     and it will send up to credit_available bytes to the other peer.
> 
> Possible solutions considering Michael's advice:
> 1. Use the buffer length instead of the payload length when we increment
>    the counters:
>   - This approach will account precisely the memory used per socket.
>   - This requires changes in both guest and host.
>   - It is not compatible with old drivers, so a feature should be negotiated.
> 2. Decrease the advertised 'buf_alloc' taking count of bytes queued in
>    the socket queue but not used. (e.g. 256 byte used on 4K available in
>    the buffer)
>   - pkt->hdr.buf_alloc = buf_alloc - bytes_not_used.
>   - This should be compatible also with old drivers.
> 
> Maybe the second is less invasive, but will it be too tricky?
> Any other advice or suggestions?
> 
> Thanks in advance,
> Stefano

OK let me try to clarify.  The idea is this:

Let's say we queue a buffer of 4K, and we copy if len < 128 bytes.  This
means that in the worst case (128 byte packets), each byte of credit in
the socket uses up 4K/128 = 16 bytes of kernel memory. In fact we need
to also account for the virtio_vsock_pkt since I think it's kept around
until userspace consumes it.

Thus given X buf alloc allowed in the socket, we should publish X/16
credits to the other side. This will ensure the other side does not send
more than X/16 bytes for a given socket and thus we won't need to
allocate more than X bytes to hold the data.

We can play with the copy break value to tweak this.




^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-08-01 13:36                       ` Stefano Garzarella
@ 2019-09-01  8:26                         ` Michael S. Tsirkin
  2019-09-01 10:17                           ` Michael S. Tsirkin
  0 siblings, 1 reply; 69+ messages in thread
From: Michael S. Tsirkin @ 2019-09-01  8:26 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Thu, Aug 01, 2019 at 03:36:16PM +0200, Stefano Garzarella wrote:
> On Thu, Aug 01, 2019 at 09:21:15AM -0400, Michael S. Tsirkin wrote:
> > On Thu, Aug 01, 2019 at 12:47:54PM +0200, Stefano Garzarella wrote:
> > > On Tue, Jul 30, 2019 at 04:42:25PM -0400, Michael S. Tsirkin wrote:
> > > > On Tue, Jul 30, 2019 at 11:35:39AM +0200, Stefano Garzarella wrote:
> > > 
> > > (...)
> > > 
> > > > > 
> > > > > The problem here is the compatibility. Before this series virtio-vsock
> > > > > and vhost-vsock modules had the RX buffer size hard-coded
> > > > > (VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE = 4K). So, if we send a buffer smaller
> > > > > of 4K, there might be issues.
> > > > 
> > > > Shouldn't be if they are following the spec. If not let's fix
> > > > the broken parts.
> > > > 
> > > > > 
> > > > > Maybe it is the time to add add 'features' to virtio-vsock device.
> > > > > 
> > > > > Thanks,
> > > > > Stefano
> > > > 
> > > > Why would a remote care about buffer sizes?
> > > > 
> > > > Let's first see what the issues are. If they exist
> > > > we can either fix the bugs, or code the bug as a feature in spec.
> > > > 
> > > 
> > > The vhost_transport '.stream_enqueue' callback
> > > [virtio_transport_stream_enqueue()] calls the virtio_transport_send_pkt_info(),
> > > passing the user message. This function allocates a new packet, copying
> > > the user message, but (before this series) it limits the packet size to
> > > the VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (4K):
> > > 
> > > static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > 					  struct virtio_vsock_pkt_info *info)
> > > {
> > >  ...
> > > 	/* we can send less than pkt_len bytes */
> > > 	if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
> > > 		pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> > > 
> > > 	/* virtio_transport_get_credit might return less than pkt_len credit */
> > > 	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
> > > 
> > > 	/* Do not send zero length OP_RW pkt */
> > > 	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> > > 		return pkt_len;
> > >  ...
> > > }
> > > 
> > > then it queues the packet for the TX worker calling .send_pkt()
> > > [vhost_transport_send_pkt() in the vhost_transport case]
> > > 
> > > The main function executed by the TX worker is
> > > vhost_transport_do_send_pkt() that picks up a buffer from the virtqueue
> > > and it tries to copy the packet (up to 4K) on it.  If the buffer
> > > allocated from the guest will be smaller then 4K, I think here it will
> > > be discarded with an error:
> > > 
> 
> I'm adding more lines to explain better.
> 
> > > static void
> > > vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> > > 				struct vhost_virtqueue *vq)
> > > {
> 		...
> 
> 		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
> 					 &out, &in, NULL, NULL);
> 
> 		...
> 
> 		len = iov_length(&vq->iov[out], in);
> 		iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
> 
> 		nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
> 		if (nbytes != sizeof(pkt->hdr)) {
> 			virtio_transport_free_pkt(pkt);
> 			vq_err(vq, "Faulted on copying pkt hdr\n");
> 			break;
> 		}
> 
> > >  ...
> > > 		nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);
> > 
> > isn't pck len the actual length though?
> > 
> 
> It is the length of the packet that we are copying in the guest RX
> buffers pointed by the iov_iter. The guest allocates an iovec with 2
> buffers, one for the header and one for the payload (4KB).

BTW at the moment that forces another kmalloc within virtio core. Maybe
vsock needs a flag to skip allocation in this case.  Worth benchmarking.
See virtqueue_use_indirect which just does total_sg > 1.

> 
> > > 		if (nbytes != pkt->len) {
> > > 			virtio_transport_free_pkt(pkt);
> > > 			vq_err(vq, "Faulted on copying pkt buf\n");
> > > 			break;
> > > 		}
> > >  ...
> > > }
> > > 
> > > 
> > > This series changes this behavior since now we will split the packet in
> > > vhost_transport_do_send_pkt() depending on the buffer found in the
> > > virtqueue.
> > > 
> > > We didn't change the buffer size in this series, so we still backward
> > > compatible, but if we will use buffers smaller than 4K, we should
> > > encounter the error described above.

So that's an implementation bug then? It made an assumption
of a 4K sized buffer? Or even PAGE_SIZE sized buffer?


> > > 
> > > How do you suggest we proceed if we want to change the buffer size?
> > > Maybe adding a feature to "support any buffer size"?
> > > 
> > > Thanks,
> > > Stefano
> > 
> > 
> 
> -- 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-09-01  8:26                         ` Michael S. Tsirkin
@ 2019-09-01 10:17                           ` Michael S. Tsirkin
  2019-09-02  9:57                             ` Stefano Garzarella
  0 siblings, 1 reply; 69+ messages in thread
From: Michael S. Tsirkin @ 2019-09-01 10:17 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Sun, Sep 01, 2019 at 04:26:19AM -0400, Michael S. Tsirkin wrote:
> On Thu, Aug 01, 2019 at 03:36:16PM +0200, Stefano Garzarella wrote:
> > On Thu, Aug 01, 2019 at 09:21:15AM -0400, Michael S. Tsirkin wrote:
> > > On Thu, Aug 01, 2019 at 12:47:54PM +0200, Stefano Garzarella wrote:
> > > > On Tue, Jul 30, 2019 at 04:42:25PM -0400, Michael S. Tsirkin wrote:
> > > > > On Tue, Jul 30, 2019 at 11:35:39AM +0200, Stefano Garzarella wrote:
> > > > 
> > > > (...)
> > > > 
> > > > > > 
> > > > > > The problem here is the compatibility. Before this series virtio-vsock
> > > > > > and vhost-vsock modules had the RX buffer size hard-coded
> > > > > > (VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE = 4K). So, if we send a buffer smaller
> > > > > > of 4K, there might be issues.
> > > > > 
> > > > > Shouldn't be if they are following the spec. If not let's fix
> > > > > the broken parts.
> > > > > 
> > > > > > 
> > > > > > Maybe it is the time to add add 'features' to virtio-vsock device.
> > > > > > 
> > > > > > Thanks,
> > > > > > Stefano
> > > > > 
> > > > > Why would a remote care about buffer sizes?
> > > > > 
> > > > > Let's first see what the issues are. If they exist
> > > > > we can either fix the bugs, or code the bug as a feature in spec.
> > > > > 
> > > > 
> > > > The vhost_transport '.stream_enqueue' callback
> > > > [virtio_transport_stream_enqueue()] calls the virtio_transport_send_pkt_info(),
> > > > passing the user message. This function allocates a new packet, copying
> > > > the user message, but (before this series) it limits the packet size to
> > > > the VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (4K):
> > > > 
> > > > static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > > 					  struct virtio_vsock_pkt_info *info)
> > > > {
> > > >  ...
> > > > 	/* we can send less than pkt_len bytes */
> > > > 	if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
> > > > 		pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> > > > 
> > > > 	/* virtio_transport_get_credit might return less than pkt_len credit */
> > > > 	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
> > > > 
> > > > 	/* Do not send zero length OP_RW pkt */
> > > > 	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> > > > 		return pkt_len;
> > > >  ...
> > > > }
> > > > 
> > > > then it queues the packet for the TX worker calling .send_pkt()
> > > > [vhost_transport_send_pkt() in the vhost_transport case]
> > > > 
> > > > The main function executed by the TX worker is
> > > > vhost_transport_do_send_pkt() that picks up a buffer from the virtqueue
> > > > and it tries to copy the packet (up to 4K) on it.  If the buffer
> > > > allocated from the guest will be smaller then 4K, I think here it will
> > > > be discarded with an error:
> > > > 
> > 
> > I'm adding more lines to explain better.
> > 
> > > > static void
> > > > vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> > > > 				struct vhost_virtqueue *vq)
> > > > {
> > 		...
> > 
> > 		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
> > 					 &out, &in, NULL, NULL);
> > 
> > 		...
> > 
> > 		len = iov_length(&vq->iov[out], in);
> > 		iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
> > 
> > 		nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
> > 		if (nbytes != sizeof(pkt->hdr)) {
> > 			virtio_transport_free_pkt(pkt);
> > 			vq_err(vq, "Faulted on copying pkt hdr\n");
> > 			break;
> > 		}
> > 
> > > >  ...
> > > > 		nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);
> > > 
> > > isn't pck len the actual length though?
> > > 
> > 
> > It is the length of the packet that we are copying in the guest RX
> > buffers pointed by the iov_iter. The guest allocates an iovec with 2
> > buffers, one for the header and one for the payload (4KB).
> 
> BTW at the moment that forces another kmalloc within virtio core. Maybe
> vsock needs a flag to skip allocation in this case.  Worth benchmarking.
> See virtqueue_use_indirect which just does total_sg > 1.
> 
> > 
> > > > 		if (nbytes != pkt->len) {
> > > > 			virtio_transport_free_pkt(pkt);
> > > > 			vq_err(vq, "Faulted on copying pkt buf\n");
> > > > 			break;
> > > > 		}
> > > >  ...
> > > > }
> > > > 
> > > > 
> > > > This series changes this behavior since now we will split the packet in
> > > > vhost_transport_do_send_pkt() depending on the buffer found in the
> > > > virtqueue.
> > > > 
> > > > We didn't change the buffer size in this series, so we still backward
> > > > compatible, but if we will use buffers smaller than 4K, we should
> > > > encounter the error described above.
> 
> So that's an implementation bug then? It made an assumption
> of a 4K sized buffer? Or even PAGE_SIZE sized buffer?

Assuming we miss nothing and buffers < 4K are broken,
I think we need to add this to the spec, possibly with
a feature bit to relax the requirement that all buffers
are at least 4k in size.

> 
> > > > 
> > > > How do you suggest we proceed if we want to change the buffer size?
> > > > Maybe adding a feature to "support any buffer size"?
> > > > 
> > > > Thanks,
> > > > Stefano
> > > 
> > > 
> > 
> > -- 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-09-01  6:56       ` Michael S. Tsirkin
@ 2019-09-02  8:39         ` Stefan Hajnoczi
  2019-09-02  8:55           ` Stefano Garzarella
  2019-10-11 13:40         ` Stefano Garzarella
  1 sibling, 1 reply; 69+ messages in thread
From: Stefan Hajnoczi @ 2019-09-02  8:39 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Stefano Garzarella, kvm, netdev, linux-kernel, virtualization,
	Stefan Hajnoczi, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 5393 bytes --]

On Sun, Sep 01, 2019 at 02:56:44AM -0400, Michael S. Tsirkin wrote:
> On Fri, Aug 30, 2019 at 11:40:59AM +0200, Stefano Garzarella wrote:
> > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > and pushed to the guest using the vring, are directly queued in
> > > > a per-socket list. These buffers are preallocated by the guest
> > > > with a fixed size (4 KB).
> > > > 
> > > > The maximum amount of memory used by each socket should be
> > > > controlled by the credit mechanism.
> > > > The default credit available per-socket is 256 KB, but if we use
> > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > to avoid starvation of other sockets.
> > > > 
> > > > This patch mitigates this issue copying the payload of small
> > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > order to avoid wasting memory.
> > > > 
> > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > 
> > > This is good enough for net-next, but for net I think we
> > > should figure out how to address the issue completely.
> > > Can we make the accounting precise? What happens to
> > > performance if we do?
> > > 
> > 
> > Since I'm back from holidays, I'm restarting this thread to figure out
> > how to address the issue completely.
> > 
> > I did a better analysis of the credit mechanism that we implemented in
> > virtio-vsock to get a clearer view and I'd share it with you:
> > 
> >     This issue affect only the "host->guest" path. In this case, when the
> >     host wants to send a packet to the guest, it uses a "free" buffer
> >     allocated by the guest (4KB).
> >     The "free" buffers available for the host are shared between all
> >     sockets, instead, the credit mechanism is per-socket, I think to
> >     avoid the starvation of others sockets.
> >     The guests re-fill the "free" queue when the available buffers are
> >     less than half.
> > 
> >     Each peer have these variables in the per-socket state:
> >        /* local vars */
> >        buf_alloc        /* max bytes usable by this socket
> >                            [exposed to the other peer] */
> >        fwd_cnt          /* increased when RX packet is consumed by the
> >                            user space [exposed to the other peer] */
> >        tx_cnt 	        /* increased when TX packet is sent to the other peer */
> > 
> >        /* remote vars  */
> >        peer_buf_alloc   /* peer's buf_alloc */
> >        peer_fwd_cnt     /* peer's fwd_cnt */
> > 
> >     When a peer sends a packet, it increases the 'tx_cnt'; when the
> >     receiver consumes the packet (copy it to the user-space buffer), it
> >     increases the 'fwd_cnt'.
> >     Note: increments are made considering the payload length and not the
> >     buffer length.
> > 
> >     The value of 'buf_alloc' and 'fwd_cnt' are sent to the other peer in
> >     all packet headers or with an explicit CREDIT_UPDATE packet.
> > 
> >     The local 'buf_alloc' value can be modified by the user space using
> >     setsockopt() with optname=SO_VM_SOCKETS_BUFFER_SIZE.
> > 
> >     Before to send a packet, the peer checks the space available:
> >     	credit_available = peer_buf_alloc - (tx_cnt - peer_fwd_cnt)
> >     and it will send up to credit_available bytes to the other peer.
> > 
> > Possible solutions considering Michael's advice:
> > 1. Use the buffer length instead of the payload length when we increment
> >    the counters:
> >   - This approach will account precisely the memory used per socket.
> >   - This requires changes in both guest and host.
> >   - It is not compatible with old drivers, so a feature should be negotiated.
> > 2. Decrease the advertised 'buf_alloc' taking count of bytes queued in
> >    the socket queue but not used. (e.g. 256 byte used on 4K available in
> >    the buffer)
> >   - pkt->hdr.buf_alloc = buf_alloc - bytes_not_used.
> >   - This should be compatible also with old drivers.
> > 
> > Maybe the second is less invasive, but will it be too tricky?
> > Any other advice or suggestions?
> > 
> > Thanks in advance,
> > Stefano
> 
> OK let me try to clarify.  The idea is this:
> 
> Let's say we queue a buffer of 4K, and we copy if len < 128 bytes.  This
> means that in the worst case (128 byte packets), each byte of credit in
> the socket uses up 4K/128 = 16 bytes of kernel memory. In fact we need
> to also account for the virtio_vsock_pkt since I think it's kept around
> until userspace consumes it.
> 
> Thus given X buf alloc allowed in the socket, we should publish X/16
> credits to the other side. This will ensure the other side does not send
> more than X/16 bytes for a given socket and thus we won't need to
> allocate more than X bytes to hold the data.
> 
> We can play with the copy break value to tweak this.

This seems like a reasonable solution.  Hopefully the benchmark results
will come out okay too.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-09-02  8:39         ` Stefan Hajnoczi
@ 2019-09-02  8:55           ` Stefano Garzarella
  0 siblings, 0 replies; 69+ messages in thread
From: Stefano Garzarella @ 2019-09-02  8:55 UTC (permalink / raw)
  To: Michael S. Tsirkin, Stefan Hajnoczi
  Cc: kvm, netdev, linux-kernel, virtualization, Stefan Hajnoczi,
	David S. Miller

On Mon, Sep 02, 2019 at 09:39:12AM +0100, Stefan Hajnoczi wrote:
> On Sun, Sep 01, 2019 at 02:56:44AM -0400, Michael S. Tsirkin wrote:
> > 
> > OK let me try to clarify.  The idea is this:
> > 
> > Let's say we queue a buffer of 4K, and we copy if len < 128 bytes.  This
> > means that in the worst case (128 byte packets), each byte of credit in
> > the socket uses up 4K/128 = 16 bytes of kernel memory. In fact we need
> > to also account for the virtio_vsock_pkt since I think it's kept around
> > until userspace consumes it.
> > 
> > Thus given X buf alloc allowed in the socket, we should publish X/16
> > credits to the other side. This will ensure the other side does not send
> > more than X/16 bytes for a given socket and thus we won't need to
> > allocate more than X bytes to hold the data.
> > 
> > We can play with the copy break value to tweak this.

Thanks Michael, now it is perfectly clear. It seems an excellent solution and
easy to implement. I'll work on that.

> 
> This seems like a reasonable solution.  Hopefully the benchmark results
> will come out okay too.

Yes, as Michael suggested I'll play with the copy break value to see as
benchmark has affected.

Thank you very much,
Stefano

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-09-01 10:17                           ` Michael S. Tsirkin
@ 2019-09-02  9:57                             ` Stefano Garzarella
  2019-09-02 15:23                               ` Michael S. Tsirkin
  2019-09-03  4:39                               ` Michael S. Tsirkin
  0 siblings, 2 replies; 69+ messages in thread
From: Stefano Garzarella @ 2019-09-02  9:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Sun, Sep 01, 2019 at 06:17:58AM -0400, Michael S. Tsirkin wrote:
> On Sun, Sep 01, 2019 at 04:26:19AM -0400, Michael S. Tsirkin wrote:
> > On Thu, Aug 01, 2019 at 03:36:16PM +0200, Stefano Garzarella wrote:
> > > On Thu, Aug 01, 2019 at 09:21:15AM -0400, Michael S. Tsirkin wrote:
> > > > On Thu, Aug 01, 2019 at 12:47:54PM +0200, Stefano Garzarella wrote:
> > > > > On Tue, Jul 30, 2019 at 04:42:25PM -0400, Michael S. Tsirkin wrote:
> > > > > > On Tue, Jul 30, 2019 at 11:35:39AM +0200, Stefano Garzarella wrote:
> > > > > 
> > > > > (...)
> > > > > 
> > > > > > > 
> > > > > > > The problem here is the compatibility. Before this series virtio-vsock
> > > > > > > and vhost-vsock modules had the RX buffer size hard-coded
> > > > > > > (VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE = 4K). So, if we send a buffer smaller
> > > > > > > of 4K, there might be issues.
> > > > > > 
> > > > > > Shouldn't be if they are following the spec. If not let's fix
> > > > > > the broken parts.
> > > > > > 
> > > > > > > 
> > > > > > > Maybe it is the time to add add 'features' to virtio-vsock device.
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > Stefano
> > > > > > 
> > > > > > Why would a remote care about buffer sizes?
> > > > > > 
> > > > > > Let's first see what the issues are. If they exist
> > > > > > we can either fix the bugs, or code the bug as a feature in spec.
> > > > > > 
> > > > > 
> > > > > The vhost_transport '.stream_enqueue' callback
> > > > > [virtio_transport_stream_enqueue()] calls the virtio_transport_send_pkt_info(),
> > > > > passing the user message. This function allocates a new packet, copying
> > > > > the user message, but (before this series) it limits the packet size to
> > > > > the VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (4K):
> > > > > 
> > > > > static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > > > 					  struct virtio_vsock_pkt_info *info)
> > > > > {
> > > > >  ...
> > > > > 	/* we can send less than pkt_len bytes */
> > > > > 	if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
> > > > > 		pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> > > > > 
> > > > > 	/* virtio_transport_get_credit might return less than pkt_len credit */
> > > > > 	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
> > > > > 
> > > > > 	/* Do not send zero length OP_RW pkt */
> > > > > 	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> > > > > 		return pkt_len;
> > > > >  ...
> > > > > }
> > > > > 
> > > > > then it queues the packet for the TX worker calling .send_pkt()
> > > > > [vhost_transport_send_pkt() in the vhost_transport case]
> > > > > 
> > > > > The main function executed by the TX worker is
> > > > > vhost_transport_do_send_pkt() that picks up a buffer from the virtqueue
> > > > > and it tries to copy the packet (up to 4K) on it.  If the buffer
> > > > > allocated from the guest will be smaller then 4K, I think here it will
> > > > > be discarded with an error:
> > > > > 
> > > 
> > > I'm adding more lines to explain better.
> > > 
> > > > > static void
> > > > > vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> > > > > 				struct vhost_virtqueue *vq)
> > > > > {
> > > 		...
> > > 
> > > 		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
> > > 					 &out, &in, NULL, NULL);
> > > 
> > > 		...
> > > 
> > > 		len = iov_length(&vq->iov[out], in);
> > > 		iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
> > > 
> > > 		nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
> > > 		if (nbytes != sizeof(pkt->hdr)) {
> > > 			virtio_transport_free_pkt(pkt);
> > > 			vq_err(vq, "Faulted on copying pkt hdr\n");
> > > 			break;
> > > 		}
> > > 
> > > > >  ...
> > > > > 		nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);
> > > > 
> > > > isn't pck len the actual length though?
> > > > 
> > > 
> > > It is the length of the packet that we are copying in the guest RX
> > > buffers pointed by the iov_iter. The guest allocates an iovec with 2
> > > buffers, one for the header and one for the payload (4KB).
> > 
> > BTW at the moment that forces another kmalloc within virtio core. Maybe
> > vsock needs a flag to skip allocation in this case.  Worth benchmarking.
> > See virtqueue_use_indirect which just does total_sg > 1.

Okay, I'll take a look at virtqueue_use_indirect and I'll do some
benchmarking.

> > 
> > > 
> > > > > 		if (nbytes != pkt->len) {
> > > > > 			virtio_transport_free_pkt(pkt);
> > > > > 			vq_err(vq, "Faulted on copying pkt buf\n");
> > > > > 			break;
> > > > > 		}
> > > > >  ...
> > > > > }
> > > > > 
> > > > > 
> > > > > This series changes this behavior since now we will split the packet in
> > > > > vhost_transport_do_send_pkt() depending on the buffer found in the
> > > > > virtqueue.
> > > > > 
> > > > > We didn't change the buffer size in this series, so we still backward
> > > > > compatible, but if we will use buffers smaller than 4K, we should
> > > > > encounter the error described above.
> > 
> > So that's an implementation bug then? It made an assumption
> > of a 4K sized buffer? Or even PAGE_SIZE sized buffer?

Yes, I think it made an assumption and it used this macro as a limit:

include/linux/virtio_vsock.h:13:
    #define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE        (1024 * 4)

> 
> Assuming we miss nothing and buffers < 4K are broken,
> I think we need to add this to the spec, possibly with
> a feature bit to relax the requirement that all buffers
> are at least 4k in size.
> 

Okay, should I send a proposal to virtio-dev@lists.oasis-open.org?

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-09-02  9:57                             ` Stefano Garzarella
@ 2019-09-02 15:23                               ` Michael S. Tsirkin
  2019-09-03  4:39                               ` Michael S. Tsirkin
  1 sibling, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2019-09-02 15:23 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Mon, Sep 02, 2019 at 11:57:23AM +0200, Stefano Garzarella wrote:
> On Sun, Sep 01, 2019 at 06:17:58AM -0400, Michael S. Tsirkin wrote:
> > On Sun, Sep 01, 2019 at 04:26:19AM -0400, Michael S. Tsirkin wrote:
> > > On Thu, Aug 01, 2019 at 03:36:16PM +0200, Stefano Garzarella wrote:
> > > > On Thu, Aug 01, 2019 at 09:21:15AM -0400, Michael S. Tsirkin wrote:
> > > > > On Thu, Aug 01, 2019 at 12:47:54PM +0200, Stefano Garzarella wrote:
> > > > > > On Tue, Jul 30, 2019 at 04:42:25PM -0400, Michael S. Tsirkin wrote:
> > > > > > > On Tue, Jul 30, 2019 at 11:35:39AM +0200, Stefano Garzarella wrote:
> > > > > > 
> > > > > > (...)
> > > > > > 
> > > > > > > > 
> > > > > > > > The problem here is the compatibility. Before this series virtio-vsock
> > > > > > > > and vhost-vsock modules had the RX buffer size hard-coded
> > > > > > > > (VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE = 4K). So, if we send a buffer smaller
> > > > > > > > of 4K, there might be issues.
> > > > > > > 
> > > > > > > Shouldn't be if they are following the spec. If not let's fix
> > > > > > > the broken parts.
> > > > > > > 
> > > > > > > > 
> > > > > > > > Maybe it is the time to add add 'features' to virtio-vsock device.
> > > > > > > > 
> > > > > > > > Thanks,
> > > > > > > > Stefano
> > > > > > > 
> > > > > > > Why would a remote care about buffer sizes?
> > > > > > > 
> > > > > > > Let's first see what the issues are. If they exist
> > > > > > > we can either fix the bugs, or code the bug as a feature in spec.
> > > > > > > 
> > > > > > 
> > > > > > The vhost_transport '.stream_enqueue' callback
> > > > > > [virtio_transport_stream_enqueue()] calls the virtio_transport_send_pkt_info(),
> > > > > > passing the user message. This function allocates a new packet, copying
> > > > > > the user message, but (before this series) it limits the packet size to
> > > > > > the VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (4K):
> > > > > > 
> > > > > > static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > > > > 					  struct virtio_vsock_pkt_info *info)
> > > > > > {
> > > > > >  ...
> > > > > > 	/* we can send less than pkt_len bytes */
> > > > > > 	if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
> > > > > > 		pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> > > > > > 
> > > > > > 	/* virtio_transport_get_credit might return less than pkt_len credit */
> > > > > > 	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
> > > > > > 
> > > > > > 	/* Do not send zero length OP_RW pkt */
> > > > > > 	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> > > > > > 		return pkt_len;
> > > > > >  ...
> > > > > > }
> > > > > > 
> > > > > > then it queues the packet for the TX worker calling .send_pkt()
> > > > > > [vhost_transport_send_pkt() in the vhost_transport case]
> > > > > > 
> > > > > > The main function executed by the TX worker is
> > > > > > vhost_transport_do_send_pkt() that picks up a buffer from the virtqueue
> > > > > > and it tries to copy the packet (up to 4K) on it.  If the buffer
> > > > > > allocated from the guest will be smaller then 4K, I think here it will
> > > > > > be discarded with an error:
> > > > > > 
> > > > 
> > > > I'm adding more lines to explain better.
> > > > 
> > > > > > static void
> > > > > > vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> > > > > > 				struct vhost_virtqueue *vq)
> > > > > > {
> > > > 		...
> > > > 
> > > > 		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
> > > > 					 &out, &in, NULL, NULL);
> > > > 
> > > > 		...
> > > > 
> > > > 		len = iov_length(&vq->iov[out], in);
> > > > 		iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
> > > > 
> > > > 		nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
> > > > 		if (nbytes != sizeof(pkt->hdr)) {
> > > > 			virtio_transport_free_pkt(pkt);
> > > > 			vq_err(vq, "Faulted on copying pkt hdr\n");
> > > > 			break;
> > > > 		}
> > > > 
> > > > > >  ...
> > > > > > 		nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);
> > > > > 
> > > > > isn't pck len the actual length though?
> > > > > 
> > > > 
> > > > It is the length of the packet that we are copying in the guest RX
> > > > buffers pointed by the iov_iter. The guest allocates an iovec with 2
> > > > buffers, one for the header and one for the payload (4KB).
> > > 
> > > BTW at the moment that forces another kmalloc within virtio core. Maybe
> > > vsock needs a flag to skip allocation in this case.  Worth benchmarking.
> > > See virtqueue_use_indirect which just does total_sg > 1.
> 
> Okay, I'll take a look at virtqueue_use_indirect and I'll do some
> benchmarking.
> 
> > > 
> > > > 
> > > > > > 		if (nbytes != pkt->len) {
> > > > > > 			virtio_transport_free_pkt(pkt);
> > > > > > 			vq_err(vq, "Faulted on copying pkt buf\n");
> > > > > > 			break;
> > > > > > 		}
> > > > > >  ...
> > > > > > }
> > > > > > 
> > > > > > 
> > > > > > This series changes this behavior since now we will split the packet in
> > > > > > vhost_transport_do_send_pkt() depending on the buffer found in the
> > > > > > virtqueue.
> > > > > > 
> > > > > > We didn't change the buffer size in this series, so we still backward
> > > > > > compatible, but if we will use buffers smaller than 4K, we should
> > > > > > encounter the error described above.
> > > 
> > > So that's an implementation bug then? It made an assumption
> > > of a 4K sized buffer? Or even PAGE_SIZE sized buffer?
> 
> Yes, I think it made an assumption and it used this macro as a limit:
> 
> include/linux/virtio_vsock.h:13:
>     #define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE        (1024 * 4)
> 
> > 
> > Assuming we miss nothing and buffers < 4K are broken,
> > I think we need to add this to the spec, possibly with
> > a feature bit to relax the requirement that all buffers
> > are at least 4k in size.
> > 
> 
> Okay, should I send a proposal to virtio-dev@lists.oasis-open.org?
> 
> Thanks,
> Stefano

virtio-comment is more appropriate for this.

-- 
MST

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 2/5] vsock/virtio: reduce credit update messages
  2019-07-17 11:30 ` [PATCH v4 2/5] vsock/virtio: reduce credit update messages Stefano Garzarella
  2019-07-22  8:36   ` Stefan Hajnoczi
@ 2019-09-03  4:38   ` Michael S. Tsirkin
  2019-09-03  7:31     ` Stefano Garzarella
  1 sibling, 1 reply; 69+ messages in thread
From: Michael S. Tsirkin @ 2019-09-03  4:38 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Wed, Jul 17, 2019 at 01:30:27PM +0200, Stefano Garzarella wrote:
> In order to reduce the number of credit update messages,
> we send them only when the space available seen by the
> transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.
> 
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  include/linux/virtio_vsock.h            |  1 +
>  net/vmw_vsock/virtio_transport_common.c | 16 +++++++++++++---
>  2 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index 7d973903f52e..49fc9d20bc43 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -41,6 +41,7 @@ struct virtio_vsock_sock {
>  
>  	/* Protected by rx_lock */
>  	u32 fwd_cnt;
> +	u32 last_fwd_cnt;
>  	u32 rx_bytes;
>  	struct list_head rx_queue;
>  };
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 095221f94786..a85559d4d974 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -211,6 +211,7 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
>  void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
>  {
>  	spin_lock_bh(&vvs->tx_lock);
> +	vvs->last_fwd_cnt = vvs->fwd_cnt;
>  	pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
>  	pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
>  	spin_unlock_bh(&vvs->tx_lock);
> @@ -261,6 +262,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
>  	struct virtio_vsock_sock *vvs = vsk->trans;
>  	struct virtio_vsock_pkt *pkt;
>  	size_t bytes, total = 0;
> +	u32 free_space;
>  	int err = -EFAULT;
>  
>  	spin_lock_bh(&vvs->rx_lock);
> @@ -291,11 +293,19 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
>  			virtio_transport_free_pkt(pkt);
>  		}
>  	}
> +
> +	free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
> +
>  	spin_unlock_bh(&vvs->rx_lock);
>  
> -	/* Send a credit pkt to peer */
> -	virtio_transport_send_credit_update(vsk, VIRTIO_VSOCK_TYPE_STREAM,
> -					    NULL);
> +	/* We send a credit update only when the space available seen
> +	 * by the transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE

This is just repeating what code does though.
Please include the *reason* for the condition.
E.g. here's a better comment:

	/* To reduce number of credit update messages,
	 * don't update credits as long as lots of space is available.
	 * Note: the limit chosen here is arbitrary. Setting the limit
	 * too high causes extra messages. Too low causes transmitter
	 * stalls. As stalls are in theory more expensive than extra
	 * messages, we set the limit to a high value. TODO: experiment
	 * with different values.
	 */


> +	 */
> +	if (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
> +		virtio_transport_send_credit_update(vsk,
> +						    VIRTIO_VSOCK_TYPE_STREAM,
> +						    NULL);
> +	}
>  
>  	return total;
>  
> -- 
> 2.20.1

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-09-02  9:57                             ` Stefano Garzarella
  2019-09-02 15:23                               ` Michael S. Tsirkin
@ 2019-09-03  4:39                               ` Michael S. Tsirkin
  2019-09-03  7:45                                 ` Stefano Garzarella
  1 sibling, 1 reply; 69+ messages in thread
From: Michael S. Tsirkin @ 2019-09-03  4:39 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Mon, Sep 02, 2019 at 11:57:23AM +0200, Stefano Garzarella wrote:
> > 
> > Assuming we miss nothing and buffers < 4K are broken,
> > I think we need to add this to the spec, possibly with
> > a feature bit to relax the requirement that all buffers
> > are at least 4k in size.
> > 
> 
> Okay, should I send a proposal to virtio-dev@lists.oasis-open.org?

How about we also fix the bug for now?

-- 
MST

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 2/5] vsock/virtio: reduce credit update messages
  2019-09-03  4:38   ` Michael S. Tsirkin
@ 2019-09-03  7:31     ` Stefano Garzarella
  2019-09-03  7:38       ` Michael S. Tsirkin
  0 siblings, 1 reply; 69+ messages in thread
From: Stefano Garzarella @ 2019-09-03  7:31 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Tue, Sep 03, 2019 at 12:38:02AM -0400, Michael S. Tsirkin wrote:
> On Wed, Jul 17, 2019 at 01:30:27PM +0200, Stefano Garzarella wrote:
> > In order to reduce the number of credit update messages,
> > we send them only when the space available seen by the
> > transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.
> > 
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > ---
> >  include/linux/virtio_vsock.h            |  1 +
> >  net/vmw_vsock/virtio_transport_common.c | 16 +++++++++++++---
> >  2 files changed, 14 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index 7d973903f52e..49fc9d20bc43 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -41,6 +41,7 @@ struct virtio_vsock_sock {
> >  
> >  	/* Protected by rx_lock */
> >  	u32 fwd_cnt;
> > +	u32 last_fwd_cnt;
> >  	u32 rx_bytes;
> >  	struct list_head rx_queue;
> >  };
> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > index 095221f94786..a85559d4d974 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -211,6 +211,7 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
> >  void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
> >  {
> >  	spin_lock_bh(&vvs->tx_lock);
> > +	vvs->last_fwd_cnt = vvs->fwd_cnt;
> >  	pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
> >  	pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
> >  	spin_unlock_bh(&vvs->tx_lock);
> > @@ -261,6 +262,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> >  	struct virtio_vsock_sock *vvs = vsk->trans;
> >  	struct virtio_vsock_pkt *pkt;
> >  	size_t bytes, total = 0;
> > +	u32 free_space;
> >  	int err = -EFAULT;
> >  
> >  	spin_lock_bh(&vvs->rx_lock);
> > @@ -291,11 +293,19 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> >  			virtio_transport_free_pkt(pkt);
> >  		}
> >  	}
> > +
> > +	free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
> > +
> >  	spin_unlock_bh(&vvs->rx_lock);
> >  
> > -	/* Send a credit pkt to peer */
> > -	virtio_transport_send_credit_update(vsk, VIRTIO_VSOCK_TYPE_STREAM,
> > -					    NULL);
> > +	/* We send a credit update only when the space available seen
> > +	 * by the transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE
> 
> This is just repeating what code does though.
> Please include the *reason* for the condition.
> E.g. here's a better comment:
> 
> 	/* To reduce number of credit update messages,
> 	 * don't update credits as long as lots of space is available.
> 	 * Note: the limit chosen here is arbitrary. Setting the limit
> 	 * too high causes extra messages. Too low causes transmitter
> 	 * stalls. As stalls are in theory more expensive than extra
> 	 * messages, we set the limit to a high value. TODO: experiment
> 	 * with different values.
> 	 */
> 

Yes, it is better, sorry for that. I'll try to avoid unnecessary comments,
explaining the reason for certain changes.

Since this patch is already queued in net-next, should I send another
patch to fix the comment?

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 2/5] vsock/virtio: reduce credit update messages
  2019-09-03  7:31     ` Stefano Garzarella
@ 2019-09-03  7:38       ` Michael S. Tsirkin
  0 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2019-09-03  7:38 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Tue, Sep 03, 2019 at 09:31:20AM +0200, Stefano Garzarella wrote:
> On Tue, Sep 03, 2019 at 12:38:02AM -0400, Michael S. Tsirkin wrote:
> > On Wed, Jul 17, 2019 at 01:30:27PM +0200, Stefano Garzarella wrote:
> > > In order to reduce the number of credit update messages,
> > > we send them only when the space available seen by the
> > > transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.
> > > 
> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > ---
> > >  include/linux/virtio_vsock.h            |  1 +
> > >  net/vmw_vsock/virtio_transport_common.c | 16 +++++++++++++---
> > >  2 files changed, 14 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > > index 7d973903f52e..49fc9d20bc43 100644
> > > --- a/include/linux/virtio_vsock.h
> > > +++ b/include/linux/virtio_vsock.h
> > > @@ -41,6 +41,7 @@ struct virtio_vsock_sock {
> > >  
> > >  	/* Protected by rx_lock */
> > >  	u32 fwd_cnt;
> > > +	u32 last_fwd_cnt;
> > >  	u32 rx_bytes;
> > >  	struct list_head rx_queue;
> > >  };
> > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > > index 095221f94786..a85559d4d974 100644
> > > --- a/net/vmw_vsock/virtio_transport_common.c
> > > +++ b/net/vmw_vsock/virtio_transport_common.c
> > > @@ -211,6 +211,7 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
> > >  void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
> > >  {
> > >  	spin_lock_bh(&vvs->tx_lock);
> > > +	vvs->last_fwd_cnt = vvs->fwd_cnt;
> > >  	pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
> > >  	pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
> > >  	spin_unlock_bh(&vvs->tx_lock);
> > > @@ -261,6 +262,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> > >  	struct virtio_vsock_sock *vvs = vsk->trans;
> > >  	struct virtio_vsock_pkt *pkt;
> > >  	size_t bytes, total = 0;
> > > +	u32 free_space;
> > >  	int err = -EFAULT;
> > >  
> > >  	spin_lock_bh(&vvs->rx_lock);
> > > @@ -291,11 +293,19 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> > >  			virtio_transport_free_pkt(pkt);
> > >  		}
> > >  	}
> > > +
> > > +	free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
> > > +
> > >  	spin_unlock_bh(&vvs->rx_lock);
> > >  
> > > -	/* Send a credit pkt to peer */
> > > -	virtio_transport_send_credit_update(vsk, VIRTIO_VSOCK_TYPE_STREAM,
> > > -					    NULL);
> > > +	/* We send a credit update only when the space available seen
> > > +	 * by the transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE
> > 
> > This is just repeating what code does though.
> > Please include the *reason* for the condition.
> > E.g. here's a better comment:
> > 
> > 	/* To reduce number of credit update messages,
> > 	 * don't update credits as long as lots of space is available.
> > 	 * Note: the limit chosen here is arbitrary. Setting the limit
> > 	 * too high causes extra messages. Too low causes transmitter
> > 	 * stalls. As stalls are in theory more expensive than extra
> > 	 * messages, we set the limit to a high value. TODO: experiment
> > 	 * with different values.
> > 	 */
> > 
> 
> Yes, it is better, sorry for that. I'll try to avoid unnecessary comments,
> explaining the reason for certain changes.
> 
> Since this patch is already queued in net-next, should I send another
> patch to fix the comment?
> 
> Thanks,
> Stefano

I just sent a patch like that, pls ack it.

-- 
MST

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-09-03  4:39                               ` Michael S. Tsirkin
@ 2019-09-03  7:45                                 ` Stefano Garzarella
  2019-09-03  7:52                                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 69+ messages in thread
From: Stefano Garzarella @ 2019-09-03  7:45 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Tue, Sep 03, 2019 at 12:39:19AM -0400, Michael S. Tsirkin wrote:
> On Mon, Sep 02, 2019 at 11:57:23AM +0200, Stefano Garzarella wrote:
> > > 
> > > Assuming we miss nothing and buffers < 4K are broken,
> > > I think we need to add this to the spec, possibly with
> > > a feature bit to relax the requirement that all buffers
> > > are at least 4k in size.
> > > 
> > 
> > Okay, should I send a proposal to virtio-dev@lists.oasis-open.org?
> 
> How about we also fix the bug for now?

This series unintentionally fix the bug because we are introducing a way
to split the packet depending on the buffer size ([PATCH 4/5] vhost/vsock:
split packets to send using multiple buffers) and we removed the limit
to 4K buffers ([PATCH 5/5] vsock/virtio: change the maximum packet size
allowed).

I discovered that there was a bug while we discussed memory accounting.

Do you think it's enough while we introduce the feature bit in the spec?

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-09-03  7:45                                 ` Stefano Garzarella
@ 2019-09-03  7:52                                   ` Michael S. Tsirkin
  2019-09-03  8:00                                     ` Stefano Garzarella
  0 siblings, 1 reply; 69+ messages in thread
From: Michael S. Tsirkin @ 2019-09-03  7:52 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Tue, Sep 03, 2019 at 09:45:54AM +0200, Stefano Garzarella wrote:
> On Tue, Sep 03, 2019 at 12:39:19AM -0400, Michael S. Tsirkin wrote:
> > On Mon, Sep 02, 2019 at 11:57:23AM +0200, Stefano Garzarella wrote:
> > > > 
> > > > Assuming we miss nothing and buffers < 4K are broken,
> > > > I think we need to add this to the spec, possibly with
> > > > a feature bit to relax the requirement that all buffers
> > > > are at least 4k in size.
> > > > 
> > > 
> > > Okay, should I send a proposal to virtio-dev@lists.oasis-open.org?
> > 
> > How about we also fix the bug for now?
> 
> This series unintentionally fix the bug because we are introducing a way
> to split the packet depending on the buffer size ([PATCH 4/5] vhost/vsock:
> split packets to send using multiple buffers) and we removed the limit
> to 4K buffers ([PATCH 5/5] vsock/virtio: change the maximum packet size
> allowed).
> 
> I discovered that there was a bug while we discussed memory accounting.
> 
> Do you think it's enough while we introduce the feature bit in the spec?
> 
> Thanks,
> Stefano

Well locking is also broken (patch 3/5).  It seems that 3/5 and 4/5 work
by themselves, right?  So how about we ask Dave to send these to stable?
Also, how about 1/5? Also needed for stable?


-- 
MST

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-09-03  7:52                                   ` Michael S. Tsirkin
@ 2019-09-03  8:00                                     ` Stefano Garzarella
  0 siblings, 0 replies; 69+ messages in thread
From: Stefano Garzarella @ 2019-09-03  8:00 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

On Tue, Sep 03, 2019 at 03:52:24AM -0400, Michael S. Tsirkin wrote:
> On Tue, Sep 03, 2019 at 09:45:54AM +0200, Stefano Garzarella wrote:
> > On Tue, Sep 03, 2019 at 12:39:19AM -0400, Michael S. Tsirkin wrote:
> > > On Mon, Sep 02, 2019 at 11:57:23AM +0200, Stefano Garzarella wrote:
> > > > > 
> > > > > Assuming we miss nothing and buffers < 4K are broken,
> > > > > I think we need to add this to the spec, possibly with
> > > > > a feature bit to relax the requirement that all buffers
> > > > > are at least 4k in size.
> > > > > 
> > > > 
> > > > Okay, should I send a proposal to virtio-dev@lists.oasis-open.org?
> > > 
> > > How about we also fix the bug for now?
> > 
> > This series unintentionally fix the bug because we are introducing a way
> > to split the packet depending on the buffer size ([PATCH 4/5] vhost/vsock:
> > split packets to send using multiple buffers) and we removed the limit
> > to 4K buffers ([PATCH 5/5] vsock/virtio: change the maximum packet size
> > allowed).
> > 
> > I discovered that there was a bug while we discussed memory accounting.
> > 
> > Do you think it's enough while we introduce the feature bit in the spec?
> > 
> > Thanks,
> > Stefano
> 
> Well locking is also broken (patch 3/5).  It seems that 3/5 and 4/5 work
> by themselves, right?  So how about we ask Dave to send these to stable?

Yes, they work by themselves and I agree that should be send to stable.

> Also, how about 1/5? Also needed for stable?

I think so, without this patch if we flood the guest with 1-byte packets,
we can consume ~ 1 GB of guest memory per-socket.

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 69+ messages in thread

* request for stable (was Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput)
  2019-07-17 11:30 [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
                   ` (6 preceding siblings ...)
  2019-07-29 13:59 ` Michael S. Tsirkin
@ 2019-09-03  8:02 ` Michael S. Tsirkin
  7 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2019-09-03  8:02 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm

Patches 1,3 and 4 are needed for stable.
Dave, could you queue them there please?

On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:
> This series tries to increase the throughput of virtio-vsock with slight
> changes.
> While I was testing the v2 of this series I discovered an huge use of memory,
> so I added patch 1 to mitigate this issue. I put it in this series in order
> to better track the performance trends.
> 
> v4:
> - rebased all patches on current master (conflicts is Patch 4)
> - Patch 1: added Stefan's R-b
> - Patch 3: removed lock when buf_alloc is written [David];
>            moved this patch after "vsock/virtio: reduce credit update messages"
>            to make it clearer
> - Patch 4: vhost_exceeds_weight() is recently introduced, so I've solved some
>            conflicts
> 
> v3: https://patchwork.kernel.org/cover/10970145
> 
> v2: https://patchwork.kernel.org/cover/10938743
> 
> v1: https://patchwork.kernel.org/cover/10885431
> 
> Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> support. As Micheal suggested in the v1, I booted host and guest with 'nosmap'.
> 
> A brief description of patches:
> - Patches 1:   limit the memory usage with an extra copy for small packets
> - Patches 2+3: reduce the number of credit update messages sent to the
>                transmitter
> - Patches 4+5: allow the host to split packets on multiple buffers and use
>                VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
> 
>                     host -> guest [Gbps]
> pkt_size before opt   p 1     p 2+3    p 4+5
> 
> 32         0.032     0.030    0.048    0.051
> 64         0.061     0.059    0.108    0.117
> 128        0.122     0.112    0.227    0.234
> 256        0.244     0.241    0.418    0.415
> 512        0.459     0.466    0.847    0.865
> 1K         0.927     0.919    1.657    1.641
> 2K         1.884     1.813    3.262    3.269
> 4K         3.378     3.326    6.044    6.195
> 8K         5.637     5.676   10.141   11.287
> 16K        8.250     8.402   15.976   16.736
> 32K       13.327    13.204   19.013   20.515
> 64K       21.241    21.341   20.973   21.879
> 128K      21.851    22.354   21.816   23.203
> 256K      21.408    21.693   21.846   24.088
> 512K      21.600    21.899   21.921   24.106
> 
>                     guest -> host [Gbps]
> pkt_size before opt   p 1     p 2+3    p 4+5
> 
> 32         0.045     0.046    0.057    0.057
> 64         0.089     0.091    0.103    0.104
> 128        0.170     0.179    0.192    0.200
> 256        0.364     0.351    0.361    0.379
> 512        0.709     0.699    0.731    0.790
> 1K         1.399     1.407    1.395    1.427
> 2K         2.670     2.684    2.745    2.835
> 4K         5.171     5.199    5.305    5.451
> 8K         8.442     8.500   10.083    9.941
> 16K       12.305    12.259   13.519   15.385
> 32K       11.418    11.150   11.988   24.680
> 64K       10.778    10.659   11.589   35.273
> 128K      10.421    10.339   10.939   40.338
> 256K      10.300     9.719   10.508   36.562
> 512K       9.833     9.808   10.612   35.979
> 
> As Stefan suggested in the v1, I measured also the efficiency in this way:
>     efficiency = Mbps / (%CPU_Host + %CPU_Guest)
> 
> The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> but it's provided for free from iperf3 and could be an indication.
> 
>         host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt   p 1     p 2+3    p 4+5
> 
> 32         0.35      0.45     0.79     1.02
> 64         0.56      0.80     1.41     1.54
> 128        1.11      1.52     3.03     3.12
> 256        2.20      2.16     5.44     5.58
> 512        4.17      4.18    10.96    11.46
> 1K         8.30      8.26    20.99    20.89
> 2K        16.82     16.31    39.76    39.73
> 4K        30.89     30.79    74.07    75.73
> 8K        53.74     54.49   124.24   148.91
> 16K       80.68     83.63   200.21   232.79
> 32K      132.27    132.52   260.81   357.07
> 64K      229.82    230.40   300.19   444.18
> 128K     332.60    329.78   331.51   492.28
> 256K     331.06    337.22   339.59   511.59
> 512K     335.58    328.50   331.56   504.56
> 
>         guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt   p 1     p 2+3    p 4+5
> 
> 32         0.43      0.43     0.53     0.56
> 64         0.85      0.86     1.04     1.10
> 128        1.63      1.71     2.07     2.13
> 256        3.48      3.35     4.02     4.22
> 512        6.80      6.67     7.97     8.63
> 1K        13.32     13.31    15.72    15.94
> 2K        25.79     25.92    30.84    30.98
> 4K        50.37     50.48    58.79    59.69
> 8K        95.90     96.15   107.04   110.33
> 16K      145.80    145.43   143.97   174.70
> 32K      147.06    144.74   146.02   282.48
> 64K      145.25    143.99   141.62   406.40
> 128K     149.34    146.96   147.49   489.34
> 256K     156.35    149.81   152.21   536.37
> 512K     151.65    150.74   151.52   519.93
> 
> [1] https://github.com/stefano-garzarella/iperf/
> 
> Stefano Garzarella (5):
>   vsock/virtio: limit the memory used per-socket
>   vsock/virtio: reduce credit update messages
>   vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
>   vhost/vsock: split packets to send using multiple buffers
>   vsock/virtio: change the maximum packet size allowed
> 
>  drivers/vhost/vsock.c                   | 68 ++++++++++++-----
>  include/linux/virtio_vsock.h            |  4 +-
>  net/vmw_vsock/virtio_transport.c        |  1 +
>  net/vmw_vsock/virtio_transport_common.c | 99 ++++++++++++++++++++-----
>  4 files changed, 134 insertions(+), 38 deletions(-)
> 
> -- 
> 2.20.1

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-09-01  6:56       ` Michael S. Tsirkin
  2019-09-02  8:39         ` Stefan Hajnoczi
@ 2019-10-11 13:40         ` Stefano Garzarella
  2019-10-11 14:11           ` Michael S. Tsirkin
  2019-10-14  8:17           ` Stefan Hajnoczi
  1 sibling, 2 replies; 69+ messages in thread
From: Stefano Garzarella @ 2019-10-11 13:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Stefan Hajnoczi, netdev, linux-kernel, David S. Miller,
	virtualization, Jason Wang, kvm

On Sun, Sep 1, 2019 at 8:56 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> On Fri, Aug 30, 2019 at 11:40:59AM +0200, Stefano Garzarella wrote:
> > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > and pushed to the guest using the vring, are directly queued in
> > > > a per-socket list. These buffers are preallocated by the guest
> > > > with a fixed size (4 KB).
> > > >
> > > > The maximum amount of memory used by each socket should be
> > > > controlled by the credit mechanism.
> > > > The default credit available per-socket is 256 KB, but if we use
> > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > to avoid starvation of other sockets.
> > > >
> > > > This patch mitigates this issue copying the payload of small
> > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > order to avoid wasting memory.
> > > >
> > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > >
> > > This is good enough for net-next, but for net I think we
> > > should figure out how to address the issue completely.
> > > Can we make the accounting precise? What happens to
> > > performance if we do?
> > >
> >
> > Since I'm back from holidays, I'm restarting this thread to figure out
> > how to address the issue completely.
> >
> > I did a better analysis of the credit mechanism that we implemented in
> > virtio-vsock to get a clearer view and I'd share it with you:
> >
> >     This issue affect only the "host->guest" path. In this case, when the
> >     host wants to send a packet to the guest, it uses a "free" buffer
> >     allocated by the guest (4KB).
> >     The "free" buffers available for the host are shared between all
> >     sockets, instead, the credit mechanism is per-socket, I think to
> >     avoid the starvation of others sockets.
> >     The guests re-fill the "free" queue when the available buffers are
> >     less than half.
> >
> >     Each peer have these variables in the per-socket state:
> >        /* local vars */
> >        buf_alloc        /* max bytes usable by this socket
> >                            [exposed to the other peer] */
> >        fwd_cnt          /* increased when RX packet is consumed by the
> >                            user space [exposed to the other peer] */
> >        tx_cnt                 /* increased when TX packet is sent to the other peer */
> >
> >        /* remote vars  */
> >        peer_buf_alloc   /* peer's buf_alloc */
> >        peer_fwd_cnt     /* peer's fwd_cnt */
> >
> >     When a peer sends a packet, it increases the 'tx_cnt'; when the
> >     receiver consumes the packet (copy it to the user-space buffer), it
> >     increases the 'fwd_cnt'.
> >     Note: increments are made considering the payload length and not the
> >     buffer length.
> >
> >     The value of 'buf_alloc' and 'fwd_cnt' are sent to the other peer in
> >     all packet headers or with an explicit CREDIT_UPDATE packet.
> >
> >     The local 'buf_alloc' value can be modified by the user space using
> >     setsockopt() with optname=SO_VM_SOCKETS_BUFFER_SIZE.
> >
> >     Before to send a packet, the peer checks the space available:
> >       credit_available = peer_buf_alloc - (tx_cnt - peer_fwd_cnt)
> >     and it will send up to credit_available bytes to the other peer.
> >
> > Possible solutions considering Michael's advice:
> > 1. Use the buffer length instead of the payload length when we increment
> >    the counters:
> >   - This approach will account precisely the memory used per socket.
> >   - This requires changes in both guest and host.
> >   - It is not compatible with old drivers, so a feature should be negotiated.
> > 2. Decrease the advertised 'buf_alloc' taking count of bytes queued in
> >    the socket queue but not used. (e.g. 256 byte used on 4K available in
> >    the buffer)
> >   - pkt->hdr.buf_alloc = buf_alloc - bytes_not_used.
> >   - This should be compatible also with old drivers.
> >
> > Maybe the second is less invasive, but will it be too tricky?
> > Any other advice or suggestions?
> >
> > Thanks in advance,
> > Stefano
>
> OK let me try to clarify.  The idea is this:
>
> Let's say we queue a buffer of 4K, and we copy if len < 128 bytes.  This
> means that in the worst case (128 byte packets), each byte of credit in
> the socket uses up 4K/128 = 16 bytes of kernel memory. In fact we need
> to also account for the virtio_vsock_pkt since I think it's kept around
> until userspace consumes it.
>
> Thus given X buf alloc allowed in the socket, we should publish X/16
> credits to the other side. This will ensure the other side does not send
> more than X/16 bytes for a given socket and thus we won't need to
> allocate more than X bytes to hold the data.
>
> We can play with the copy break value to tweak this.
>

Hi Michael,
sorry for the long silence, but I focused on multi-transport.

Before to implement your idea, I tried to do some calculations and
looking better to our credit mechanism:

  buf_alloc = 256 KB (default, tunable through setsockopt)
  sizeof(struct virtio_vsock_pkt) = 128

  - guest (we use preallocated 4 KB buffers to receive packets, copying
    small packet - < 128 -)
    worst_case = 129
    buf_size = 4 KB
    credit2mem = (buf_size + sizeof(struct virtio_vsock_pkt)) / worst_case = 32

    credit_published = buf_alloc / credit2mem = ~8 KB
    Space for just 2 full packet (4 KB)

  - host (we copy packets from the vring, allocating the space for the payload)
    worst_case = 1
    buf_size = 1
    credit2mem = (buf_size + sizeof(struct virtio_vsock_pkt)) / worst_case = 129

    credit_published = buf_alloc / credit2mem = ~2 KB
    Less than a full packet (guest now can send up to 64 KB with a single
    packet, so it will be limited to 2 KB)

Current memory consumption in the worst case if the RX queue is full:
  - guest
    mem = (buf_alloc / worst_case) *
          (buf_size + sizeof(struct virtio_vsock_pkt) = ~8MB

  - host
    mem = (buf_alloc / worst_case) *
          (buf_size + sizeof(struct virtio_vsock_pkt) = ~32MB

I think that the performance with big packets will be affected,
but I still have to try.

Another approach that I want to explore is to play with buf_alloc
published to the peer.

One thing that's not clear to me yet is the meaning of
SO_VM_SOCKETS_BUFFER_SIZE:
- max amount of memory used in the RX queue
- max amount of payload bytes in the RX queue (without overhead of
  struct virtio_vsock_pkt + preallocated buffer)

From the 'include/uapi/linux/vm_sockets.h':
    /* Option name for STREAM socket buffer size.  Use as the option name in
     * setsockopt(3) or getsockopt(3) to set or get an unsigned long long that
     * specifies the size of the buffer underlying a vSockets STREAM socket.
     * Value is clamped to the MIN and MAX.
     */

    #define SO_VM_SOCKETS_BUFFER_SIZE 0

Regardless, I think we need to limit memory consumption in some way.
I'll check the implementation of other transports, to understand better.

I'll keep you updated!

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-10-11 13:40         ` Stefano Garzarella
@ 2019-10-11 14:11           ` Michael S. Tsirkin
  2019-10-11 14:23             ` Stefano Garzarella
  2019-10-14  8:17           ` Stefan Hajnoczi
  1 sibling, 1 reply; 69+ messages in thread
From: Michael S. Tsirkin @ 2019-10-11 14:11 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Stefan Hajnoczi, netdev, linux-kernel, David S. Miller,
	virtualization, Jason Wang, kvm

On Fri, Oct 11, 2019 at 03:40:48PM +0200, Stefano Garzarella wrote:
> On Sun, Sep 1, 2019 at 8:56 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Fri, Aug 30, 2019 at 11:40:59AM +0200, Stefano Garzarella wrote:
> > > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > > and pushed to the guest using the vring, are directly queued in
> > > > > a per-socket list. These buffers are preallocated by the guest
> > > > > with a fixed size (4 KB).
> > > > >
> > > > > The maximum amount of memory used by each socket should be
> > > > > controlled by the credit mechanism.
> > > > > The default credit available per-socket is 256 KB, but if we use
> > > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > > to avoid starvation of other sockets.
> > > > >
> > > > > This patch mitigates this issue copying the payload of small
> > > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > > order to avoid wasting memory.
> > > > >
> > > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > >
> > > > This is good enough for net-next, but for net I think we
> > > > should figure out how to address the issue completely.
> > > > Can we make the accounting precise? What happens to
> > > > performance if we do?
> > > >
> > >
> > > Since I'm back from holidays, I'm restarting this thread to figure out
> > > how to address the issue completely.
> > >
> > > I did a better analysis of the credit mechanism that we implemented in
> > > virtio-vsock to get a clearer view and I'd share it with you:
> > >
> > >     This issue affect only the "host->guest" path. In this case, when the
> > >     host wants to send a packet to the guest, it uses a "free" buffer
> > >     allocated by the guest (4KB).
> > >     The "free" buffers available for the host are shared between all
> > >     sockets, instead, the credit mechanism is per-socket, I think to
> > >     avoid the starvation of others sockets.
> > >     The guests re-fill the "free" queue when the available buffers are
> > >     less than half.
> > >
> > >     Each peer have these variables in the per-socket state:
> > >        /* local vars */
> > >        buf_alloc        /* max bytes usable by this socket
> > >                            [exposed to the other peer] */
> > >        fwd_cnt          /* increased when RX packet is consumed by the
> > >                            user space [exposed to the other peer] */
> > >        tx_cnt                 /* increased when TX packet is sent to the other peer */
> > >
> > >        /* remote vars  */
> > >        peer_buf_alloc   /* peer's buf_alloc */
> > >        peer_fwd_cnt     /* peer's fwd_cnt */
> > >
> > >     When a peer sends a packet, it increases the 'tx_cnt'; when the
> > >     receiver consumes the packet (copy it to the user-space buffer), it
> > >     increases the 'fwd_cnt'.
> > >     Note: increments are made considering the payload length and not the
> > >     buffer length.
> > >
> > >     The value of 'buf_alloc' and 'fwd_cnt' are sent to the other peer in
> > >     all packet headers or with an explicit CREDIT_UPDATE packet.
> > >
> > >     The local 'buf_alloc' value can be modified by the user space using
> > >     setsockopt() with optname=SO_VM_SOCKETS_BUFFER_SIZE.
> > >
> > >     Before to send a packet, the peer checks the space available:
> > >       credit_available = peer_buf_alloc - (tx_cnt - peer_fwd_cnt)
> > >     and it will send up to credit_available bytes to the other peer.
> > >
> > > Possible solutions considering Michael's advice:
> > > 1. Use the buffer length instead of the payload length when we increment
> > >    the counters:
> > >   - This approach will account precisely the memory used per socket.
> > >   - This requires changes in both guest and host.
> > >   - It is not compatible with old drivers, so a feature should be negotiated.
> > > 2. Decrease the advertised 'buf_alloc' taking count of bytes queued in
> > >    the socket queue but not used. (e.g. 256 byte used on 4K available in
> > >    the buffer)
> > >   - pkt->hdr.buf_alloc = buf_alloc - bytes_not_used.
> > >   - This should be compatible also with old drivers.
> > >
> > > Maybe the second is less invasive, but will it be too tricky?
> > > Any other advice or suggestions?
> > >
> > > Thanks in advance,
> > > Stefano
> >
> > OK let me try to clarify.  The idea is this:
> >
> > Let's say we queue a buffer of 4K, and we copy if len < 128 bytes.  This
> > means that in the worst case (128 byte packets), each byte of credit in
> > the socket uses up 4K/128 = 16 bytes of kernel memory. In fact we need
> > to also account for the virtio_vsock_pkt since I think it's kept around
> > until userspace consumes it.
> >
> > Thus given X buf alloc allowed in the socket, we should publish X/16
> > credits to the other side. This will ensure the other side does not send
> > more than X/16 bytes for a given socket and thus we won't need to
> > allocate more than X bytes to hold the data.
> >
> > We can play with the copy break value to tweak this.
> >
> 
> Hi Michael,
> sorry for the long silence, but I focused on multi-transport.
> 
> Before to implement your idea, I tried to do some calculations and
> looking better to our credit mechanism:
> 
>   buf_alloc = 256 KB (default, tunable through setsockopt)
>   sizeof(struct virtio_vsock_pkt) = 128
> 
>   - guest (we use preallocated 4 KB buffers to receive packets, copying
>     small packet - < 128 -)
>     worst_case = 129
>     buf_size = 4 KB
>     credit2mem = (buf_size + sizeof(struct virtio_vsock_pkt)) / worst_case = 32

Making virtio_vsock_pkt smaller will help btw.
E.g. why keep a work struct per packet? do we need the header there? etc
etc.

> 
>     credit_published = buf_alloc / credit2mem = ~8 KB
>     Space for just 2 full packet (4 KB)
> 
>   - host (we copy packets from the vring, allocating the space for the payload)
>     worst_case = 1
>     buf_size = 1
>     credit2mem = (buf_size + sizeof(struct virtio_vsock_pkt)) / worst_case = 129
> 
>     credit_published = buf_alloc / credit2mem = ~2 KB
>     Less than a full packet (guest now can send up to 64 KB with a single
>     packet, so it will be limited to 2 KB)

1 byte in struct virtio_vsock_pkt is crazy. Can't we copy
like we do in the guest? E.g. allocate at least 128 bytes,
and then we can add data to the tail of the last packet?

> Current memory consumption in the worst case if the RX queue is full:
>   - guest
>     mem = (buf_alloc / worst_case) *
>           (buf_size + sizeof(struct virtio_vsock_pkt) = ~8MB
> 
>   - host
>     mem = (buf_alloc / worst_case) *
>           (buf_size + sizeof(struct virtio_vsock_pkt) = ~32MB
> 
> I think that the performance with big packets will be affected,
> but I still have to try.
> 
> Another approach that I want to explore is to play with buf_alloc
> published to the peer.
> 
> One thing that's not clear to me yet is the meaning of
> SO_VM_SOCKETS_BUFFER_SIZE:
> - max amount of memory used in the RX queue
> - max amount of payload bytes in the RX queue (without overhead of
>   struct virtio_vsock_pkt + preallocated buffer)
> 
> >From the 'include/uapi/linux/vm_sockets.h':
>     /* Option name for STREAM socket buffer size.  Use as the option name in
>      * setsockopt(3) or getsockopt(3) to set or get an unsigned long long that
>      * specifies the size of the buffer underlying a vSockets STREAM socket.
>      * Value is clamped to the MIN and MAX.
>      */
> 
>     #define SO_VM_SOCKETS_BUFFER_SIZE 0
> 
> Regardless, I think we need to limit memory consumption in some way.
> I'll check the implementation of other transports, to understand better.
> 
> I'll keep you updated!
> 
> Thanks,
> Stefano

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-10-11 14:11           ` Michael S. Tsirkin
@ 2019-10-11 14:23             ` Stefano Garzarella
  0 siblings, 0 replies; 69+ messages in thread
From: Stefano Garzarella @ 2019-10-11 14:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Stefan Hajnoczi, netdev, linux-kernel, David S. Miller,
	virtualization, Jason Wang, kvm

On Fri, Oct 11, 2019 at 10:11:19AM -0400, Michael S. Tsirkin wrote:
> On Fri, Oct 11, 2019 at 03:40:48PM +0200, Stefano Garzarella wrote:
> > On Sun, Sep 1, 2019 at 8:56 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > On Fri, Aug 30, 2019 at 11:40:59AM +0200, Stefano Garzarella wrote:
> > > > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > > > and pushed to the guest using the vring, are directly queued in
> > > > > > a per-socket list. These buffers are preallocated by the guest
> > > > > > with a fixed size (4 KB).
> > > > > >
> > > > > > The maximum amount of memory used by each socket should be
> > > > > > controlled by the credit mechanism.
> > > > > > The default credit available per-socket is 256 KB, but if we use
> > > > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > > > to avoid starvation of other sockets.
> > > > > >
> > > > > > This patch mitigates this issue copying the payload of small
> > > > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > > > order to avoid wasting memory.
> > > > > >
> > > > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > > >
> > > > > This is good enough for net-next, but for net I think we
> > > > > should figure out how to address the issue completely.
> > > > > Can we make the accounting precise? What happens to
> > > > > performance if we do?
> > > > >
> > > >
> > > > Since I'm back from holidays, I'm restarting this thread to figure out
> > > > how to address the issue completely.
> > > >
> > > > I did a better analysis of the credit mechanism that we implemented in
> > > > virtio-vsock to get a clearer view and I'd share it with you:
> > > >
> > > >     This issue affect only the "host->guest" path. In this case, when the
> > > >     host wants to send a packet to the guest, it uses a "free" buffer
> > > >     allocated by the guest (4KB).
> > > >     The "free" buffers available for the host are shared between all
> > > >     sockets, instead, the credit mechanism is per-socket, I think to
> > > >     avoid the starvation of others sockets.
> > > >     The guests re-fill the "free" queue when the available buffers are
> > > >     less than half.
> > > >
> > > >     Each peer have these variables in the per-socket state:
> > > >        /* local vars */
> > > >        buf_alloc        /* max bytes usable by this socket
> > > >                            [exposed to the other peer] */
> > > >        fwd_cnt          /* increased when RX packet is consumed by the
> > > >                            user space [exposed to the other peer] */
> > > >        tx_cnt                 /* increased when TX packet is sent to the other peer */
> > > >
> > > >        /* remote vars  */
> > > >        peer_buf_alloc   /* peer's buf_alloc */
> > > >        peer_fwd_cnt     /* peer's fwd_cnt */
> > > >
> > > >     When a peer sends a packet, it increases the 'tx_cnt'; when the
> > > >     receiver consumes the packet (copy it to the user-space buffer), it
> > > >     increases the 'fwd_cnt'.
> > > >     Note: increments are made considering the payload length and not the
> > > >     buffer length.
> > > >
> > > >     The value of 'buf_alloc' and 'fwd_cnt' are sent to the other peer in
> > > >     all packet headers or with an explicit CREDIT_UPDATE packet.
> > > >
> > > >     The local 'buf_alloc' value can be modified by the user space using
> > > >     setsockopt() with optname=SO_VM_SOCKETS_BUFFER_SIZE.
> > > >
> > > >     Before to send a packet, the peer checks the space available:
> > > >       credit_available = peer_buf_alloc - (tx_cnt - peer_fwd_cnt)
> > > >     and it will send up to credit_available bytes to the other peer.
> > > >
> > > > Possible solutions considering Michael's advice:
> > > > 1. Use the buffer length instead of the payload length when we increment
> > > >    the counters:
> > > >   - This approach will account precisely the memory used per socket.
> > > >   - This requires changes in both guest and host.
> > > >   - It is not compatible with old drivers, so a feature should be negotiated.
> > > > 2. Decrease the advertised 'buf_alloc' taking count of bytes queued in
> > > >    the socket queue but not used. (e.g. 256 byte used on 4K available in
> > > >    the buffer)
> > > >   - pkt->hdr.buf_alloc = buf_alloc - bytes_not_used.
> > > >   - This should be compatible also with old drivers.
> > > >
> > > > Maybe the second is less invasive, but will it be too tricky?
> > > > Any other advice or suggestions?
> > > >
> > > > Thanks in advance,
> > > > Stefano
> > >
> > > OK let me try to clarify.  The idea is this:
> > >
> > > Let's say we queue a buffer of 4K, and we copy if len < 128 bytes.  This
> > > means that in the worst case (128 byte packets), each byte of credit in
> > > the socket uses up 4K/128 = 16 bytes of kernel memory. In fact we need
> > > to also account for the virtio_vsock_pkt since I think it's kept around
> > > until userspace consumes it.
> > >
> > > Thus given X buf alloc allowed in the socket, we should publish X/16
> > > credits to the other side. This will ensure the other side does not send
> > > more than X/16 bytes for a given socket and thus we won't need to
> > > allocate more than X bytes to hold the data.
> > >
> > > We can play with the copy break value to tweak this.
> > >
> > 
> > Hi Michael,
> > sorry for the long silence, but I focused on multi-transport.
> > 
> > Before to implement your idea, I tried to do some calculations and
> > looking better to our credit mechanism:
> > 
> >   buf_alloc = 256 KB (default, tunable through setsockopt)
> >   sizeof(struct virtio_vsock_pkt) = 128
> > 
> >   - guest (we use preallocated 4 KB buffers to receive packets, copying
> >     small packet - < 128 -)
> >     worst_case = 129
> >     buf_size = 4 KB
> >     credit2mem = (buf_size + sizeof(struct virtio_vsock_pkt)) / worst_case = 32
> 
> Making virtio_vsock_pkt smaller will help btw.
> E.g. why keep a work struct per packet? do we need the header there? etc
> etc.
> 

Yes, I had the same feeling, I'll try to make it smaller!


> > 
> >     credit_published = buf_alloc / credit2mem = ~8 KB
> >     Space for just 2 full packet (4 KB)
> > 
> >   - host (we copy packets from the vring, allocating the space for the payload)
> >     worst_case = 1
> >     buf_size = 1
> >     credit2mem = (buf_size + sizeof(struct virtio_vsock_pkt)) / worst_case = 129
> > 
> >     credit_published = buf_alloc / credit2mem = ~2 KB
> >     Less than a full packet (guest now can send up to 64 KB with a single
> >     packet, so it will be limited to 2 KB)
> 
> 1 byte in struct virtio_vsock_pkt is crazy. Can't we copy

I know :-)

> like we do in the guest? E.g. allocate at least 128 bytes,
> and then we can add data to the tail of the last packet?

Yes, I've been thinking about trying that out. (allocate at least 128 bytes
and do as we do in the guest).


Thanks,
Stefano

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-10-11 13:40         ` Stefano Garzarella
  2019-10-11 14:11           ` Michael S. Tsirkin
@ 2019-10-14  8:17           ` Stefan Hajnoczi
  2019-10-14  8:21             ` Jason Wang
  1 sibling, 1 reply; 69+ messages in thread
From: Stefan Hajnoczi @ 2019-10-14  8:17 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Michael S. Tsirkin, netdev, linux-kernel, David S. Miller,
	virtualization, Jason Wang, kvm

[-- Attachment #1: Type: text/plain, Size: 8342 bytes --]

On Fri, Oct 11, 2019 at 03:40:48PM +0200, Stefano Garzarella wrote:
> On Sun, Sep 1, 2019 at 8:56 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Fri, Aug 30, 2019 at 11:40:59AM +0200, Stefano Garzarella wrote:
> > > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > > and pushed to the guest using the vring, are directly queued in
> > > > > a per-socket list. These buffers are preallocated by the guest
> > > > > with a fixed size (4 KB).
> > > > >
> > > > > The maximum amount of memory used by each socket should be
> > > > > controlled by the credit mechanism.
> > > > > The default credit available per-socket is 256 KB, but if we use
> > > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > > to avoid starvation of other sockets.
> > > > >
> > > > > This patch mitigates this issue copying the payload of small
> > > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > > order to avoid wasting memory.
> > > > >
> > > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > >
> > > > This is good enough for net-next, but for net I think we
> > > > should figure out how to address the issue completely.
> > > > Can we make the accounting precise? What happens to
> > > > performance if we do?
> > > >
> > >
> > > Since I'm back from holidays, I'm restarting this thread to figure out
> > > how to address the issue completely.
> > >
> > > I did a better analysis of the credit mechanism that we implemented in
> > > virtio-vsock to get a clearer view and I'd share it with you:
> > >
> > >     This issue affect only the "host->guest" path. In this case, when the
> > >     host wants to send a packet to the guest, it uses a "free" buffer
> > >     allocated by the guest (4KB).
> > >     The "free" buffers available for the host are shared between all
> > >     sockets, instead, the credit mechanism is per-socket, I think to
> > >     avoid the starvation of others sockets.
> > >     The guests re-fill the "free" queue when the available buffers are
> > >     less than half.
> > >
> > >     Each peer have these variables in the per-socket state:
> > >        /* local vars */
> > >        buf_alloc        /* max bytes usable by this socket
> > >                            [exposed to the other peer] */
> > >        fwd_cnt          /* increased when RX packet is consumed by the
> > >                            user space [exposed to the other peer] */
> > >        tx_cnt                 /* increased when TX packet is sent to the other peer */
> > >
> > >        /* remote vars  */
> > >        peer_buf_alloc   /* peer's buf_alloc */
> > >        peer_fwd_cnt     /* peer's fwd_cnt */
> > >
> > >     When a peer sends a packet, it increases the 'tx_cnt'; when the
> > >     receiver consumes the packet (copy it to the user-space buffer), it
> > >     increases the 'fwd_cnt'.
> > >     Note: increments are made considering the payload length and not the
> > >     buffer length.
> > >
> > >     The value of 'buf_alloc' and 'fwd_cnt' are sent to the other peer in
> > >     all packet headers or with an explicit CREDIT_UPDATE packet.
> > >
> > >     The local 'buf_alloc' value can be modified by the user space using
> > >     setsockopt() with optname=SO_VM_SOCKETS_BUFFER_SIZE.
> > >
> > >     Before to send a packet, the peer checks the space available:
> > >       credit_available = peer_buf_alloc - (tx_cnt - peer_fwd_cnt)
> > >     and it will send up to credit_available bytes to the other peer.
> > >
> > > Possible solutions considering Michael's advice:
> > > 1. Use the buffer length instead of the payload length when we increment
> > >    the counters:
> > >   - This approach will account precisely the memory used per socket.
> > >   - This requires changes in both guest and host.
> > >   - It is not compatible with old drivers, so a feature should be negotiated.
> > > 2. Decrease the advertised 'buf_alloc' taking count of bytes queued in
> > >    the socket queue but not used. (e.g. 256 byte used on 4K available in
> > >    the buffer)
> > >   - pkt->hdr.buf_alloc = buf_alloc - bytes_not_used.
> > >   - This should be compatible also with old drivers.
> > >
> > > Maybe the second is less invasive, but will it be too tricky?
> > > Any other advice or suggestions?
> > >
> > > Thanks in advance,
> > > Stefano
> >
> > OK let me try to clarify.  The idea is this:
> >
> > Let's say we queue a buffer of 4K, and we copy if len < 128 bytes.  This
> > means that in the worst case (128 byte packets), each byte of credit in
> > the socket uses up 4K/128 = 16 bytes of kernel memory. In fact we need
> > to also account for the virtio_vsock_pkt since I think it's kept around
> > until userspace consumes it.
> >
> > Thus given X buf alloc allowed in the socket, we should publish X/16
> > credits to the other side. This will ensure the other side does not send
> > more than X/16 bytes for a given socket and thus we won't need to
> > allocate more than X bytes to hold the data.
> >
> > We can play with the copy break value to tweak this.
> >
> 
> Hi Michael,
> sorry for the long silence, but I focused on multi-transport.
> 
> Before to implement your idea, I tried to do some calculations and
> looking better to our credit mechanism:
> 
>   buf_alloc = 256 KB (default, tunable through setsockopt)
>   sizeof(struct virtio_vsock_pkt) = 128
> 
>   - guest (we use preallocated 4 KB buffers to receive packets, copying
>     small packet - < 128 -)
>     worst_case = 129
>     buf_size = 4 KB
>     credit2mem = (buf_size + sizeof(struct virtio_vsock_pkt)) / worst_case = 32
> 
>     credit_published = buf_alloc / credit2mem = ~8 KB
>     Space for just 2 full packet (4 KB)
> 
>   - host (we copy packets from the vring, allocating the space for the payload)
>     worst_case = 1
>     buf_size = 1
>     credit2mem = (buf_size + sizeof(struct virtio_vsock_pkt)) / worst_case = 129
> 
>     credit_published = buf_alloc / credit2mem = ~2 KB
>     Less than a full packet (guest now can send up to 64 KB with a single
>     packet, so it will be limited to 2 KB)
> 
> Current memory consumption in the worst case if the RX queue is full:
>   - guest
>     mem = (buf_alloc / worst_case) *
>           (buf_size + sizeof(struct virtio_vsock_pkt) = ~8MB
> 
>   - host
>     mem = (buf_alloc / worst_case) *
>           (buf_size + sizeof(struct virtio_vsock_pkt) = ~32MB
> 
> I think that the performance with big packets will be affected,
> but I still have to try.
> 
> Another approach that I want to explore is to play with buf_alloc
> published to the peer.
> 
> One thing that's not clear to me yet is the meaning of
> SO_VM_SOCKETS_BUFFER_SIZE:
> - max amount of memory used in the RX queue
> - max amount of payload bytes in the RX queue (without overhead of
>   struct virtio_vsock_pkt + preallocated buffer)
> 
> From the 'include/uapi/linux/vm_sockets.h':
>     /* Option name for STREAM socket buffer size.  Use as the option name in
>      * setsockopt(3) or getsockopt(3) to set or get an unsigned long long that
>      * specifies the size of the buffer underlying a vSockets STREAM socket.
>      * Value is clamped to the MIN and MAX.
>      */
> 
>     #define SO_VM_SOCKETS_BUFFER_SIZE 0
> 
> Regardless, I think we need to limit memory consumption in some way.
> I'll check the implementation of other transports, to understand better.

SO_VM_SOCKETS_BUFFER_SIZE might have been useful for VMCI-specific
applications, but we should use SO_RCVBUF and SO_SNDBUF for portable
applications in the future.  Those socket options also work with other
address families.

I guess these sockopts are bypassed by AF_VSOCK because it doesn't use
the common skb queuing code in net/core/sock.c :(.  But one day we might
migrate to it...

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-10-14  8:17           ` Stefan Hajnoczi
@ 2019-10-14  8:21             ` Jason Wang
  2019-10-14  8:38               ` Stefano Garzarella
  0 siblings, 1 reply; 69+ messages in thread
From: Jason Wang @ 2019-10-14  8:21 UTC (permalink / raw)
  To: Stefan Hajnoczi, Stefano Garzarella
  Cc: Michael S. Tsirkin, netdev, linux-kernel, David S. Miller,
	virtualization, kvm


On 2019/10/14 下午4:17, Stefan Hajnoczi wrote:
> SO_VM_SOCKETS_BUFFER_SIZE might have been useful for VMCI-specific
> applications, but we should use SO_RCVBUF and SO_SNDBUF for portable
> applications in the future.  Those socket options also work with other
> address families.
>
> I guess these sockopts are bypassed by AF_VSOCK because it doesn't use
> the common skb queuing code in net/core/sock.c:(.  But one day we might
> migrate to it...
>
> Stefan


+1, we should really consider to reuse the exist socket mechanism 
instead of re-inventing wheels.

Thanks


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
  2019-10-14  8:21             ` Jason Wang
@ 2019-10-14  8:38               ` Stefano Garzarella
  0 siblings, 0 replies; 69+ messages in thread
From: Stefano Garzarella @ 2019-10-14  8:38 UTC (permalink / raw)
  To: Jason Wang, Stefan Hajnoczi
  Cc: Stefan Hajnoczi, Michael S. Tsirkin, netdev, linux-kernel,
	David S. Miller, virtualization, kvm

On Mon, Oct 14, 2019 at 04:21:35PM +0800, Jason Wang wrote:
> On 2019/10/14 下午4:17, Stefan Hajnoczi wrote:
> > SO_VM_SOCKETS_BUFFER_SIZE might have been useful for VMCI-specific
> > applications, but we should use SO_RCVBUF and SO_SNDBUF for portable
> > applications in the future.  Those socket options also work with other
> > address families.
> > 

I think hyperv_transport started to use it in this patch:
ac383f58f3c9  hv_sock: perf: Allow the socket buffer size options to influence
              the actual socket buffers


> > I guess these sockopts are bypassed by AF_VSOCK because it doesn't use
> > the common skb queuing code in net/core/sock.c:(.  But one day we might
> > migrate to it...
> > 
> > Stefan
> 
> 
> +1, we should really consider to reuse the exist socket mechanism instead of
> re-inventing wheels.

+1, I totally agree. I'll go this way.

Guys, thank you all for your suggestions!

^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, other threads:[~2019-10-14  8:38 UTC | newest]

Thread overview: 69+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-17 11:30 [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput Stefano Garzarella
2019-07-17 11:30 ` [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket Stefano Garzarella
2019-07-29 14:04   ` Michael S. Tsirkin
2019-07-29 15:36     ` Stefano Garzarella
2019-07-29 15:49       ` Michael S. Tsirkin
2019-07-29 16:19         ` Stefano Garzarella
2019-07-29 16:50           ` Stefano Garzarella
2019-07-29 19:10             ` Michael S. Tsirkin
2019-07-30  9:35               ` Stefano Garzarella
2019-07-30 20:42                 ` Michael S. Tsirkin
2019-08-01 10:47                   ` Stefano Garzarella
2019-08-01 13:21                     ` Michael S. Tsirkin
2019-08-01 13:36                       ` Stefano Garzarella
2019-09-01  8:26                         ` Michael S. Tsirkin
2019-09-01 10:17                           ` Michael S. Tsirkin
2019-09-02  9:57                             ` Stefano Garzarella
2019-09-02 15:23                               ` Michael S. Tsirkin
2019-09-03  4:39                               ` Michael S. Tsirkin
2019-09-03  7:45                                 ` Stefano Garzarella
2019-09-03  7:52                                   ` Michael S. Tsirkin
2019-09-03  8:00                                     ` Stefano Garzarella
2019-07-29 16:01       ` Michael S. Tsirkin
2019-07-29 16:41         ` Stefano Garzarella
2019-07-29 19:33           ` Michael S. Tsirkin
2019-08-30  9:40     ` Stefano Garzarella
2019-09-01  6:56       ` Michael S. Tsirkin
2019-09-02  8:39         ` Stefan Hajnoczi
2019-09-02  8:55           ` Stefano Garzarella
2019-10-11 13:40         ` Stefano Garzarella
2019-10-11 14:11           ` Michael S. Tsirkin
2019-10-11 14:23             ` Stefano Garzarella
2019-10-14  8:17           ` Stefan Hajnoczi
2019-10-14  8:21             ` Jason Wang
2019-10-14  8:38               ` Stefano Garzarella
2019-07-17 11:30 ` [PATCH v4 2/5] vsock/virtio: reduce credit update messages Stefano Garzarella
2019-07-22  8:36   ` Stefan Hajnoczi
2019-09-03  4:38   ` Michael S. Tsirkin
2019-09-03  7:31     ` Stefano Garzarella
2019-09-03  7:38       ` Michael S. Tsirkin
2019-07-17 11:30 ` [PATCH v4 3/5] vsock/virtio: fix locking in virtio_transport_inc_tx_pkt() Stefano Garzarella
2019-07-17 14:51   ` Michael S. Tsirkin
2019-07-18  7:43     ` Stefano Garzarella
2019-07-22  8:53   ` Stefan Hajnoczi
2019-07-17 11:30 ` [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers Stefano Garzarella
2019-07-17 14:54   ` Michael S. Tsirkin
2019-07-18  7:50     ` Stefano Garzarella
2019-07-18  8:13       ` Michael S. Tsirkin
2019-07-18  9:37         ` Stefano Garzarella
2019-07-18 11:35           ` Michael S. Tsirkin
2019-07-19  8:08             ` Stefano Garzarella
2019-07-19  8:21               ` Jason Wang
2019-07-19  8:39                 ` Stefano Garzarella
2019-07-19  8:51                   ` Jason Wang
2019-07-19  9:20                     ` Stefano Garzarella
2019-07-22  9:06   ` Stefan Hajnoczi
2019-07-17 11:30 ` [PATCH v4 5/5] vsock/virtio: change the maximum packet size allowed Stefano Garzarella
2019-07-17 14:59   ` Michael S. Tsirkin
2019-07-18  7:52     ` Stefano Garzarella
2019-07-18 12:33       ` Michael S. Tsirkin
2019-07-19  8:29         ` Stefano Garzarella
2019-07-22  9:07   ` Stefan Hajnoczi
2019-07-22  9:08 ` [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput Stefan Hajnoczi
2019-07-22  9:14   ` Stefano Garzarella
2019-07-29 13:12     ` Stefan Hajnoczi
2019-07-29 13:59 ` Michael S. Tsirkin
2019-07-30  9:40   ` Stefano Garzarella
2019-07-30 10:03     ` Jason Wang
2019-07-30 15:38       ` Stefano Garzarella
2019-09-03  8:02 ` request for stable (was Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput) Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).