kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/7] Untrusted device support for virtio
@ 2021-04-21  3:21 Jason Wang
  2021-04-21  3:21 ` [RFC PATCH 1/7] virtio-ring: maintain next in extra state for packed virtqueue Jason Wang
                   ` (8 more replies)
  0 siblings, 9 replies; 16+ messages in thread
From: Jason Wang @ 2021-04-21  3:21 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, xieyongji, stefanha, file,
	ashish.kalra, martin.radev, konrad.wilk, kvm

Hi All:

Sometimes, the driver doesn't trust the device. This is usually
happens for the encrtpyed VM or VDUSE[1]. In both cases, technology
like swiotlb is used to prevent the poking/mangling of memory from the
device. But this is not sufficient since current virtio driver may
trust what is stored in the descriptor table (coherent mapping) for
performing the DMA operations like unmap and bounce so the device may
choose to utilize the behaviour of swiotlb to perform attacks[2].

For double insurance, to protect from a malicous device, when DMA API
is used for the device, this series store and use the descriptor
metadata in an auxiliay structure which can not be accessed via
swiotlb instead of the ones in the descriptor table. Actually, we've
almost achieved that through packed virtqueue and we just need to fix
a corner case of handling mapping errors. For split virtqueue we just
follow what's done in the packed.

Note that we don't duplicate descriptor medata for indirect
descriptors since it uses stream mapping which is read only so it's
safe if the metadata of non-indirect descriptors are correct.

The behaivor for non DMA API is kept for minimizing the performance
impact.

Slightly tested with packed on/off, iommu on/of, swiotlb force/off in
the guest.

Please review.

[1] https://lore.kernel.org/netdev/fab615ce-5e13-a3b3-3715-a4203b4ab010@redhat.com/T/
[2] https://yhbt.net/lore/all/c3629a27-3590-1d9f-211b-c0b7be152b32@redhat.com/T/#mc6b6e2343cbeffca68ca7a97e0f473aaa871c95b

Jason Wang (7):
  virtio-ring: maintain next in extra state for packed virtqueue
  virtio_ring: rename vring_desc_extra_packed
  virtio-ring: factor out desc_extra allocation
  virtio_ring: secure handling of mapping errors
  virtio_ring: introduce virtqueue_desc_add_split()
  virtio: use err label in __vring_new_virtqueue()
  virtio-ring: store DMA metadata in desc_extra for split virtqueue

 drivers/virtio/virtio_ring.c | 189 ++++++++++++++++++++++++++---------
 1 file changed, 141 insertions(+), 48 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC PATCH 1/7] virtio-ring: maintain next in extra state for packed virtqueue
  2021-04-21  3:21 [RFC PATCH 0/7] Untrusted device support for virtio Jason Wang
@ 2021-04-21  3:21 ` Jason Wang
  2021-04-21  3:21 ` [RFC PATCH 2/7] virtio_ring: rename vring_desc_extra_packed Jason Wang
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Jason Wang @ 2021-04-21  3:21 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, xieyongji, stefanha, file,
	ashish.kalra, martin.radev, konrad.wilk, kvm

This patch moves next from vring_desc_state_packed to
vring_desc_desc_extra_packed. This makes it simpler to let extra state
to be reused by split virtqueue.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 71e16b53e9c1..e1e9ed42e637 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -74,7 +74,6 @@ struct vring_desc_state_packed {
 	void *data;			/* Data for callback. */
 	struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */
 	u16 num;			/* Descriptor list length. */
-	u16 next;			/* The next desc state in a list. */
 	u16 last;			/* The last desc state in a list. */
 };
 
@@ -82,6 +81,7 @@ struct vring_desc_extra_packed {
 	dma_addr_t addr;		/* Buffer DMA addr. */
 	u32 len;			/* Buffer length. */
 	u16 flags;			/* Descriptor flags. */
+	u16 next;			/* The next desc state in a list. */
 };
 
 struct vring_virtqueue {
@@ -1061,7 +1061,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 				1 << VRING_PACKED_DESC_F_USED;
 	}
 	vq->packed.next_avail_idx = n;
-	vq->free_head = vq->packed.desc_state[id].next;
+	vq->free_head = vq->packed.desc_extra[id].next;
 
 	/* Store token and indirect buffer state. */
 	vq->packed.desc_state[id].num = 1;
@@ -1169,7 +1169,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 					le16_to_cpu(flags);
 			}
 			prev = curr;
-			curr = vq->packed.desc_state[curr].next;
+			curr = vq->packed.desc_extra[curr].next;
 
 			if ((unlikely(++i >= vq->packed.vring.num))) {
 				i = 0;
@@ -1290,7 +1290,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
 	/* Clear data ptr. */
 	state->data = NULL;
 
-	vq->packed.desc_state[state->last].next = vq->free_head;
+	vq->packed.desc_extra[state->last].next = vq->free_head;
 	vq->free_head = id;
 	vq->vq.num_free += state->num;
 
@@ -1299,7 +1299,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
 		for (i = 0; i < state->num; i++) {
 			vring_unmap_state_packed(vq,
 				&vq->packed.desc_extra[curr]);
-			curr = vq->packed.desc_state[curr].next;
+			curr = vq->packed.desc_extra[curr].next;
 		}
 	}
 
@@ -1649,8 +1649,6 @@ static struct virtqueue *vring_create_virtqueue_packed(
 
 	/* Put everything in free lists. */
 	vq->free_head = 0;
-	for (i = 0; i < num-1; i++)
-		vq->packed.desc_state[i].next = i + 1;
 
 	vq->packed.desc_extra = kmalloc_array(num,
 			sizeof(struct vring_desc_extra_packed),
@@ -1661,6 +1659,9 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	memset(vq->packed.desc_extra, 0,
 		num * sizeof(struct vring_desc_extra_packed));
 
+	for (i = 0; i < num - 1; i++)
+		vq->packed.desc_extra[i].next = i + 1;
+
 	/* No callback?  Tell other side not to bother us. */
 	if (!callback) {
 		vq->packed.event_flags_shadow = VRING_PACKED_EVENT_FLAG_DISABLE;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH 2/7] virtio_ring: rename vring_desc_extra_packed
  2021-04-21  3:21 [RFC PATCH 0/7] Untrusted device support for virtio Jason Wang
  2021-04-21  3:21 ` [RFC PATCH 1/7] virtio-ring: maintain next in extra state for packed virtqueue Jason Wang
@ 2021-04-21  3:21 ` Jason Wang
  2021-04-21  3:21 ` [RFC PATCH 3/7] virtio-ring: factor out desc_extra allocation Jason Wang
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Jason Wang @ 2021-04-21  3:21 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, xieyongji, stefanha, file,
	ashish.kalra, martin.radev, konrad.wilk, kvm

Rename vring_desc_extra_packed to vring_desc_extra since the structure
are pretty generic which could be reused by split virtqueue as well.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index e1e9ed42e637..c25ea5776687 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -77,7 +77,7 @@ struct vring_desc_state_packed {
 	u16 last;			/* The last desc state in a list. */
 };
 
-struct vring_desc_extra_packed {
+struct vring_desc_extra {
 	dma_addr_t addr;		/* Buffer DMA addr. */
 	u32 len;			/* Buffer length. */
 	u16 flags;			/* Descriptor flags. */
@@ -166,7 +166,7 @@ struct vring_virtqueue {
 
 			/* Per-descriptor state. */
 			struct vring_desc_state_packed *desc_state;
-			struct vring_desc_extra_packed *desc_extra;
+			struct vring_desc_extra *desc_extra;
 
 			/* DMA address and size information */
 			dma_addr_t ring_dma_addr;
@@ -912,7 +912,7 @@ static struct virtqueue *vring_create_virtqueue_split(
  */
 
 static void vring_unmap_state_packed(const struct vring_virtqueue *vq,
-				     struct vring_desc_extra_packed *state)
+				     struct vring_desc_extra *state)
 {
 	u16 flags;
 
@@ -1651,13 +1651,13 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	vq->free_head = 0;
 
 	vq->packed.desc_extra = kmalloc_array(num,
-			sizeof(struct vring_desc_extra_packed),
+			sizeof(struct vring_desc_extra),
 			GFP_KERNEL);
 	if (!vq->packed.desc_extra)
 		goto err_desc_extra;
 
 	memset(vq->packed.desc_extra, 0,
-		num * sizeof(struct vring_desc_extra_packed));
+		num * sizeof(struct vring_desc_extra));
 
 	for (i = 0; i < num - 1; i++)
 		vq->packed.desc_extra[i].next = i + 1;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH 3/7] virtio-ring: factor out desc_extra allocation
  2021-04-21  3:21 [RFC PATCH 0/7] Untrusted device support for virtio Jason Wang
  2021-04-21  3:21 ` [RFC PATCH 1/7] virtio-ring: maintain next in extra state for packed virtqueue Jason Wang
  2021-04-21  3:21 ` [RFC PATCH 2/7] virtio_ring: rename vring_desc_extra_packed Jason Wang
@ 2021-04-21  3:21 ` Jason Wang
  2021-04-21  3:21 ` [RFC PATCH 4/7] virtio_ring: secure handling of mapping errors Jason Wang
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Jason Wang @ 2021-04-21  3:21 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, xieyongji, stefanha, file,
	ashish.kalra, martin.radev, konrad.wilk, kvm

A helper is introduced for the logic of allocating the descriptor
extra data. This will be reused by split virtqueue.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index c25ea5776687..0cdd965dba58 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1550,6 +1550,25 @@ static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
 	return NULL;
 }
 
+static struct vring_desc_extra *vring_alloc_desc_extra(struct vring_virtqueue *vq,
+						       unsigned int num)
+{
+	struct vring_desc_extra *desc_extra;
+	unsigned int i;
+
+	desc_extra = kmalloc_array(num, sizeof(struct vring_desc_extra),
+				   GFP_KERNEL);
+	if (!desc_extra)
+		return NULL;
+
+	memset(desc_extra, 0, num * sizeof(struct vring_desc_extra));
+
+	for (i = 0; i < num - 1; i++)
+		desc_extra[i].next = i + 1;
+
+	return desc_extra;
+}
+
 static struct virtqueue *vring_create_virtqueue_packed(
 	unsigned int index,
 	unsigned int num,
@@ -1567,7 +1586,6 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	struct vring_packed_desc_event *driver, *device;
 	dma_addr_t ring_dma_addr, driver_event_dma_addr, device_event_dma_addr;
 	size_t ring_size_in_bytes, event_size_in_bytes;
-	unsigned int i;
 
 	ring_size_in_bytes = num * sizeof(struct vring_packed_desc);
 
@@ -1650,18 +1668,10 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	/* Put everything in free lists. */
 	vq->free_head = 0;
 
-	vq->packed.desc_extra = kmalloc_array(num,
-			sizeof(struct vring_desc_extra),
-			GFP_KERNEL);
+	vq->packed.desc_extra = vring_alloc_desc_extra(vq, num);
 	if (!vq->packed.desc_extra)
 		goto err_desc_extra;
 
-	memset(vq->packed.desc_extra, 0,
-		num * sizeof(struct vring_desc_extra));
-
-	for (i = 0; i < num - 1; i++)
-		vq->packed.desc_extra[i].next = i + 1;
-
 	/* No callback?  Tell other side not to bother us. */
 	if (!callback) {
 		vq->packed.event_flags_shadow = VRING_PACKED_EVENT_FLAG_DISABLE;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH 4/7] virtio_ring: secure handling of mapping errors
  2021-04-21  3:21 [RFC PATCH 0/7] Untrusted device support for virtio Jason Wang
                   ` (2 preceding siblings ...)
  2021-04-21  3:21 ` [RFC PATCH 3/7] virtio-ring: factor out desc_extra allocation Jason Wang
@ 2021-04-21  3:21 ` Jason Wang
  2021-04-21  3:21 ` [RFC PATCH 5/7] virtio_ring: introduce virtqueue_desc_add_split() Jason Wang
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Jason Wang @ 2021-04-21  3:21 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, xieyongji, stefanha, file,
	ashish.kalra, martin.radev, konrad.wilk, kvm

We should not depend on the DMA address, length and flag of descriptor
table since they could be wrote with arbitrary value by the device. So
this patch switches to use the stored one in desc_extra.

Note that the indirect descriptors are fine since they are read-only
streaming mappings.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 0cdd965dba58..5509c2643fb1 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1213,13 +1213,16 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 unmap_release:
 	err_idx = i;
 	i = head;
+	curr = vq->free_head;
 
 	vq->packed.avail_used_flags = avail_used_flags;
 
 	for (n = 0; n < total_sg; n++) {
 		if (i == err_idx)
 			break;
-		vring_unmap_desc_packed(vq, &desc[i]);
+		vring_unmap_state_packed(vq,
+					 &vq->packed.desc_extra[curr]);
+		curr = vq->packed.desc_extra[curr].next;
 		i++;
 		if (i >= vq->packed.vring.num)
 			i = 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH 5/7] virtio_ring: introduce virtqueue_desc_add_split()
  2021-04-21  3:21 [RFC PATCH 0/7] Untrusted device support for virtio Jason Wang
                   ` (3 preceding siblings ...)
  2021-04-21  3:21 ` [RFC PATCH 4/7] virtio_ring: secure handling of mapping errors Jason Wang
@ 2021-04-21  3:21 ` Jason Wang
  2021-04-21  3:21 ` [RFC PATCH 6/7] virtio: use err label in __vring_new_virtqueue() Jason Wang
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Jason Wang @ 2021-04-21  3:21 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, xieyongji, stefanha, file,
	ashish.kalra, martin.radev, konrad.wilk, kvm

This patch introduces a helper for storing descriptor in the
descriptor table for split virtqueue.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 39 ++++++++++++++++++++++--------------
 1 file changed, 24 insertions(+), 15 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 5509c2643fb1..11dfa0dc8ec1 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -412,6 +412,20 @@ static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
 	return desc;
 }
 
+static inline unsigned int virtqueue_add_desc_split(struct virtqueue *vq,
+						    struct vring_desc *desc,
+						    unsigned int i,
+						    dma_addr_t addr,
+						    unsigned int len,
+						    u16 flags)
+{
+	desc[i].flags = cpu_to_virtio16(vq->vdev, flags);
+	desc[i].addr = cpu_to_virtio64(vq->vdev, addr);
+	desc[i].len = cpu_to_virtio32(vq->vdev, len);
+
+	return virtio16_to_cpu(vq->vdev, desc[i].next);
+}
+
 static inline int virtqueue_add_split(struct virtqueue *_vq,
 				      struct scatterlist *sgs[],
 				      unsigned int total_sg,
@@ -484,11 +498,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 			if (vring_mapping_error(vq, addr))
 				goto unmap_release;
 
-			desc[i].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT);
-			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
-			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
 			prev = i;
-			i = virtio16_to_cpu(_vq->vdev, desc[i].next);
+			i = virtqueue_add_desc_split(_vq, desc, i, addr, sg->length,
+						     VRING_DESC_F_NEXT);
 		}
 	}
 	for (; n < (out_sgs + in_sgs); n++) {
@@ -497,11 +509,11 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 			if (vring_mapping_error(vq, addr))
 				goto unmap_release;
 
-			desc[i].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT | VRING_DESC_F_WRITE);
-			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
-			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
 			prev = i;
-			i = virtio16_to_cpu(_vq->vdev, desc[i].next);
+			i = virtqueue_add_desc_split(_vq, desc, i, addr,
+						     sg->length,
+						     VRING_DESC_F_NEXT |
+						     VRING_DESC_F_WRITE);
 		}
 	}
 	/* Last one doesn't continue. */
@@ -515,13 +527,10 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 		if (vring_mapping_error(vq, addr))
 			goto unmap_release;
 
-		vq->split.vring.desc[head].flags = cpu_to_virtio16(_vq->vdev,
-				VRING_DESC_F_INDIRECT);
-		vq->split.vring.desc[head].addr = cpu_to_virtio64(_vq->vdev,
-				addr);
-
-		vq->split.vring.desc[head].len = cpu_to_virtio32(_vq->vdev,
-				total_sg * sizeof(struct vring_desc));
+		virtqueue_add_desc_split(_vq, vq->split.vring.desc,
+					 head, addr,
+					 total_sg * sizeof(struct vring_desc),
+			                 VRING_DESC_F_INDIRECT);
 	}
 
 	/* We're using some buffers from the free list. */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH 6/7] virtio: use err label in __vring_new_virtqueue()
  2021-04-21  3:21 [RFC PATCH 0/7] Untrusted device support for virtio Jason Wang
                   ` (4 preceding siblings ...)
  2021-04-21  3:21 ` [RFC PATCH 5/7] virtio_ring: introduce virtqueue_desc_add_split() Jason Wang
@ 2021-04-21  3:21 ` Jason Wang
  2021-04-21  3:21 ` [RFC PATCH 7/7] virtio-ring: store DMA metadata in desc_extra for split virtqueue Jason Wang
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Jason Wang @ 2021-04-21  3:21 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, xieyongji, stefanha, file,
	ashish.kalra, martin.radev, konrad.wilk, kvm

Using error label for unwind in __vring_new_virtqueue. This is useful
for future refacotring.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 11dfa0dc8ec1..9800f1c9ce4c 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2137,10 +2137,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 
 	vq->split.desc_state = kmalloc_array(vring.num,
 			sizeof(struct vring_desc_state_split), GFP_KERNEL);
-	if (!vq->split.desc_state) {
-		kfree(vq);
-		return NULL;
-	}
+	if (!vq->split.desc_state)
+		goto err_state;
 
 	/* Put everything in free lists. */
 	vq->free_head = 0;
@@ -2151,6 +2149,10 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 
 	list_add_tail(&vq->vq.list, &vdev->vqs);
 	return &vq->vq;
+
+err_state:
+	kfree(vq);
+	return NULL;
 }
 EXPORT_SYMBOL_GPL(__vring_new_virtqueue);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH 7/7] virtio-ring: store DMA metadata in desc_extra for split virtqueue
  2021-04-21  3:21 [RFC PATCH 0/7] Untrusted device support for virtio Jason Wang
                   ` (5 preceding siblings ...)
  2021-04-21  3:21 ` [RFC PATCH 6/7] virtio: use err label in __vring_new_virtqueue() Jason Wang
@ 2021-04-21  3:21 ` Jason Wang
  2021-04-22  6:31 ` [RFC PATCH 0/7] Untrusted device support for virtio Christoph Hellwig
  2021-04-28 21:06 ` Konrad Rzeszutek Wilk
  8 siblings, 0 replies; 16+ messages in thread
From: Jason Wang @ 2021-04-21  3:21 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, xieyongji, stefanha, file,
	ashish.kalra, martin.radev, konrad.wilk, kvm

For split virtqueue, we used to depend on the address, length and
flags stored in the descriptor ring for DMA unmapping. This is unsafe
for the case when we don't trust the device since the device can tries
to manipulate the behavior of virtio driver and swiotlb.

For safety, maintain the DMA address, DMA length, descriptor flags and
next filed of the non indirect descriptors in vring_desc_state_extra
when DMA API is used for virtio as we did for packed virtqueue and use
those metadata for performing DMA operations. Indirect descriptors
should be safe since they are using streaming mappings.

For the device that doesn't use DMA API, the behavior which use
descriptor table is unchanged to minimize the performance impact.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 100 +++++++++++++++++++++++++++++------
 1 file changed, 84 insertions(+), 16 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 9800f1c9ce4c..b53ceb65f9cf 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -130,6 +130,7 @@ struct vring_virtqueue {
 
 			/* Per-descriptor state. */
 			struct vring_desc_state_split *desc_state;
+			struct vring_desc_extra *desc_extra;
 
 			/* DMA address and size information */
 			dma_addr_t queue_dma_addr;
@@ -364,8 +365,8 @@ static int vring_mapping_error(const struct vring_virtqueue *vq,
  * Split ring specific functions - *_split().
  */
 
-static void vring_unmap_one_split(const struct vring_virtqueue *vq,
-				  struct vring_desc *desc)
+static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
+					   struct vring_desc *desc)
 {
 	u16 flags;
 
@@ -389,6 +390,35 @@ static void vring_unmap_one_split(const struct vring_virtqueue *vq,
 	}
 }
 
+static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
+					  unsigned int i)
+{
+	struct vring_desc_extra *extra = vq->split.desc_extra;
+	u16 flags;
+
+	if (!vq->use_dma_api)
+		goto out;
+
+	flags = extra[i].flags;
+
+	if (flags & VRING_DESC_F_INDIRECT) {
+		dma_unmap_single(vring_dma_dev(vq),
+				 extra[i].addr,
+				 extra[i].len,
+				 (flags & VRING_DESC_F_WRITE) ?
+				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
+	} else {
+		dma_unmap_page(vring_dma_dev(vq),
+			       extra[i].addr,
+			       extra[i].len,
+			       (flags & VRING_DESC_F_WRITE) ?
+			       DMA_FROM_DEVICE : DMA_TO_DEVICE);
+	}
+
+out:
+	return extra[i].next;
+}
+
 static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
 					       unsigned int total_sg,
 					       gfp_t gfp)
@@ -417,13 +447,28 @@ static inline unsigned int virtqueue_add_desc_split(struct virtqueue *vq,
 						    unsigned int i,
 						    dma_addr_t addr,
 						    unsigned int len,
-						    u16 flags)
+						    u16 flags,
+						    bool trust)
 {
+	struct vring_virtqueue *vring = to_vvq(vq);
+	struct vring_desc_extra *extra = vring->split.desc_extra;
+	u16 next;
+
 	desc[i].flags = cpu_to_virtio16(vq->vdev, flags);
 	desc[i].addr = cpu_to_virtio64(vq->vdev, addr);
 	desc[i].len = cpu_to_virtio32(vq->vdev, len);
 
-	return virtio16_to_cpu(vq->vdev, desc[i].next);
+	if (!trust) {
+		next = extra[i].next;
+		desc[i].next = cpu_to_virtio16(vq->vdev, next);
+
+		extra[i].addr = addr;
+		extra[i].len = len;
+		extra[i].flags = flags;
+	} else
+		next = virtio16_to_cpu(vq->vdev, desc[i].next);
+
+	return next;
 }
 
 static inline int virtqueue_add_split(struct virtqueue *_vq,
@@ -499,8 +544,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 				goto unmap_release;
 
 			prev = i;
+			/* Note that we trust indirect descriptor
+			 * table since it use stream DMA mapping.
+			 */
 			i = virtqueue_add_desc_split(_vq, desc, i, addr, sg->length,
-						     VRING_DESC_F_NEXT);
+						     VRING_DESC_F_NEXT,
+						     indirect || !vq->use_dma_api);
 		}
 	}
 	for (; n < (out_sgs + in_sgs); n++) {
@@ -510,14 +559,21 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 				goto unmap_release;
 
 			prev = i;
+			/* Note that we trust indirect descriptor
+			 * table since it use stream DMA mapping.
+			 */
 			i = virtqueue_add_desc_split(_vq, desc, i, addr,
 						     sg->length,
 						     VRING_DESC_F_NEXT |
-						     VRING_DESC_F_WRITE);
+						     VRING_DESC_F_WRITE,
+						     indirect || !vq->use_dma_api);
 		}
 	}
 	/* Last one doesn't continue. */
 	desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
+	if (!indirect && vq->use_dma_api)
+		vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags =
+			~VRING_DESC_F_NEXT;
 
 	if (indirect) {
 		/* Now that the indirect table is filled in, map it. */
@@ -530,7 +586,8 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 		virtqueue_add_desc_split(_vq, vq->split.vring.desc,
 					 head, addr,
 					 total_sg * sizeof(struct vring_desc),
-			                 VRING_DESC_F_INDIRECT);
+					 VRING_DESC_F_INDIRECT,
+					 !vq->use_dma_api);
 	}
 
 	/* We're using some buffers from the free list. */
@@ -538,8 +595,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 
 	/* Update free pointer */
 	if (indirect)
-		vq->free_head = virtio16_to_cpu(_vq->vdev,
-					vq->split.vring.desc[head].next);
+		vq->free_head = vq->split.desc_extra[head].next;
 	else
 		vq->free_head = i;
 
@@ -584,8 +640,11 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	for (n = 0; n < total_sg; n++) {
 		if (i == err_idx)
 			break;
-		vring_unmap_one_split(vq, &desc[i]);
-		i = virtio16_to_cpu(_vq->vdev, desc[i].next);
+		if (indirect) {
+			vring_unmap_one_split_indirect(vq, &desc[i]);
+			i = virtio16_to_cpu(_vq->vdev, desc[i].next);
+		} else
+			i = vring_unmap_one_split(vq, i);
 	}
 
 	if (indirect)
@@ -639,14 +698,15 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 	i = head;
 
 	while (vq->split.vring.desc[i].flags & nextflag) {
-		vring_unmap_one_split(vq, &vq->split.vring.desc[i]);
-		i = virtio16_to_cpu(vq->vq.vdev, vq->split.vring.desc[i].next);
+		vring_unmap_one_split(vq, i);
+		i = vq->split.desc_extra[i].next;
 		vq->vq.num_free++;
 	}
 
-	vring_unmap_one_split(vq, &vq->split.vring.desc[i]);
+	vring_unmap_one_split(vq, i);
 	vq->split.vring.desc[i].next = cpu_to_virtio16(vq->vq.vdev,
 						vq->free_head);
+	vq->split.desc_extra[i].next = vq->free_head;
 	vq->free_head = head;
 
 	/* Plus final descriptor */
@@ -669,7 +729,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 		BUG_ON(len == 0 || len % sizeof(struct vring_desc));
 
 		for (j = 0; j < len / sizeof(struct vring_desc); j++)
-			vring_unmap_one_split(vq, &indir_desc[j]);
+			vring_unmap_one_split_indirect(vq, &indir_desc[j]);
 
 		kfree(indir_desc);
 		vq->split.desc_state[head].indir_desc = NULL;
@@ -2140,6 +2200,10 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 	if (!vq->split.desc_state)
 		goto err_state;
 
+	vq->split.desc_extra = vring_alloc_desc_extra(vq, vring.num);
+	if (!vq->split.desc_extra)
+		goto err_extra;
+
 	/* Put everything in free lists. */
 	vq->free_head = 0;
 	for (i = 0; i < vring.num-1; i++)
@@ -2150,6 +2214,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 	list_add_tail(&vq->vq.list, &vdev->vqs);
 	return &vq->vq;
 
+err_extra:
+	kfree(vq->split.desc_state);
 err_state:
 	kfree(vq);
 	return NULL;
@@ -2233,8 +2299,10 @@ void vring_del_virtqueue(struct virtqueue *_vq)
 					 vq->split.queue_dma_addr);
 		}
 	}
-	if (!vq->packed_ring)
+	if (!vq->packed_ring) {
 		kfree(vq->split.desc_state);
+		kfree(vq->split.desc_extra);
+	}
 	list_del(&_vq->list);
 	kfree(vq);
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 0/7] Untrusted device support for virtio
  2021-04-21  3:21 [RFC PATCH 0/7] Untrusted device support for virtio Jason Wang
                   ` (6 preceding siblings ...)
  2021-04-21  3:21 ` [RFC PATCH 7/7] virtio-ring: store DMA metadata in desc_extra for split virtqueue Jason Wang
@ 2021-04-22  6:31 ` Christoph Hellwig
  2021-04-22  8:19   ` Jason Wang
  2021-04-28 21:06 ` Konrad Rzeszutek Wilk
  8 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2021-04-22  6:31 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, virtualization, linux-kernel, xieyongji, stefanha, file,
	ashish.kalra, martin.radev, konrad.wilk, kvm

On Wed, Apr 21, 2021 at 11:21:10AM +0800, Jason Wang wrote:
> The behaivor for non DMA API is kept for minimizing the performance
> impact.

NAK.  Everyone should be using the DMA API in a modern world.  So
treating the DMA API path worse than the broken legacy path does not
make any sense whatsoever.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 0/7] Untrusted device support for virtio
  2021-04-22  6:31 ` [RFC PATCH 0/7] Untrusted device support for virtio Christoph Hellwig
@ 2021-04-22  8:19   ` Jason Wang
  2021-04-23 20:14     ` Michael S. Tsirkin
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Wang @ 2021-04-22  8:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: mst, virtualization, linux-kernel, xieyongji, stefanha, file,
	ashish.kalra, martin.radev, konrad.wilk, kvm


在 2021/4/22 下午2:31, Christoph Hellwig 写道:
> On Wed, Apr 21, 2021 at 11:21:10AM +0800, Jason Wang wrote:
>> The behaivor for non DMA API is kept for minimizing the performance
>> impact.
> NAK.  Everyone should be using the DMA API in a modern world.  So
> treating the DMA API path worse than the broken legacy path does not
> make any sense whatsoever.


I think the goal is not treat DMA API path worse than legacy. The issue 
is that the management layer should guarantee that ACCESS_PLATFORM is 
set so DMA API is guaranteed to be used by the driver. So I'm not sure 
how much value we can gain from trying to 'fix' the legacy path. But I 
can change the behavior of legacy path to match DMA API path.

Thanks


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 0/7] Untrusted device support for virtio
  2021-04-22  8:19   ` Jason Wang
@ 2021-04-23 20:14     ` Michael S. Tsirkin
  2021-04-25  1:43       ` Jason Wang
  0 siblings, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2021-04-23 20:14 UTC (permalink / raw)
  To: Jason Wang
  Cc: Christoph Hellwig, virtualization, linux-kernel, xieyongji,
	stefanha, file, ashish.kalra, martin.radev, konrad.wilk, kvm

On Thu, Apr 22, 2021 at 04:19:16PM +0800, Jason Wang wrote:
> 
> 在 2021/4/22 下午2:31, Christoph Hellwig 写道:
> > On Wed, Apr 21, 2021 at 11:21:10AM +0800, Jason Wang wrote:
> > > The behaivor for non DMA API is kept for minimizing the performance
> > > impact.
> > NAK.  Everyone should be using the DMA API in a modern world.  So
> > treating the DMA API path worse than the broken legacy path does not
> > make any sense whatsoever.
> 
> 
> I think the goal is not treat DMA API path worse than legacy. The issue is
> that the management layer should guarantee that ACCESS_PLATFORM is set so
> DMA API is guaranteed to be used by the driver. So I'm not sure how much
> value we can gain from trying to 'fix' the legacy path. But I can change the
> behavior of legacy path to match DMA API path.
> 
> Thanks

I think before we maintain different paths with/without ACCESS_PLATFORM
it's worth checking whether it's even a net gain. Avoiding sharing
by storing data in private memory can actually turn out to be
a net gain even without DMA API.

It is worth checking what is the performance effect of this patch.


-- 
MST


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 0/7] Untrusted device support for virtio
  2021-04-23 20:14     ` Michael S. Tsirkin
@ 2021-04-25  1:43       ` Jason Wang
  0 siblings, 0 replies; 16+ messages in thread
From: Jason Wang @ 2021-04-25  1:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, virtualization, linux-kernel, xieyongji,
	stefanha, file, ashish.kalra, martin.radev, konrad.wilk, kvm


在 2021/4/24 上午4:14, Michael S. Tsirkin 写道:
> On Thu, Apr 22, 2021 at 04:19:16PM +0800, Jason Wang wrote:
>> 在 2021/4/22 下午2:31, Christoph Hellwig 写道:
>>> On Wed, Apr 21, 2021 at 11:21:10AM +0800, Jason Wang wrote:
>>>> The behaivor for non DMA API is kept for minimizing the performance
>>>> impact.
>>> NAK.  Everyone should be using the DMA API in a modern world.  So
>>> treating the DMA API path worse than the broken legacy path does not
>>> make any sense whatsoever.
>>
>> I think the goal is not treat DMA API path worse than legacy. The issue is
>> that the management layer should guarantee that ACCESS_PLATFORM is set so
>> DMA API is guaranteed to be used by the driver. So I'm not sure how much
>> value we can gain from trying to 'fix' the legacy path. But I can change the
>> behavior of legacy path to match DMA API path.
>>
>> Thanks
> I think before we maintain different paths with/without ACCESS_PLATFORM
> it's worth checking whether it's even a net gain. Avoiding sharing
> by storing data in private memory can actually turn out to be
> a net gain even without DMA API.


I agree.


>
> It is worth checking what is the performance effect of this patch.


So I've posted v2, where private memory is used in no DMA API path (as 
what has been done in packed).

Pktgen and netperf doens't see obvious difference.

Thanks


>
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 0/7] Untrusted device support for virtio
  2021-04-21  3:21 [RFC PATCH 0/7] Untrusted device support for virtio Jason Wang
                   ` (7 preceding siblings ...)
  2021-04-22  6:31 ` [RFC PATCH 0/7] Untrusted device support for virtio Christoph Hellwig
@ 2021-04-28 21:06 ` Konrad Rzeszutek Wilk
  2021-04-29  4:16   ` Jason Wang
  8 siblings, 1 reply; 16+ messages in thread
From: Konrad Rzeszutek Wilk @ 2021-04-28 21:06 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, virtualization, linux-kernel, xieyongji, stefanha, file,
	ashish.kalra, martin.radev, kvm

On Wed, Apr 21, 2021 at 11:21:10AM +0800, Jason Wang wrote:
> Hi All:
> 
> Sometimes, the driver doesn't trust the device. This is usually
> happens for the encrtpyed VM or VDUSE[1]. In both cases, technology
> like swiotlb is used to prevent the poking/mangling of memory from the
> device. But this is not sufficient since current virtio driver may
> trust what is stored in the descriptor table (coherent mapping) for
> performing the DMA operations like unmap and bounce so the device may
> choose to utilize the behaviour of swiotlb to perform attacks[2].

We fixed it in the SWIOTLB. That is it saves the expected length
of the DMA operation. See

commit daf9514fd5eb098d7d6f3a1247cb8cc48fc94155 
Author: Martin Radev <martin.b.radev@gmail.com>
Date:   Tue Jan 12 16:07:29 2021 +0100

    swiotlb: Validate bounce size in the sync/unmap path
    
    The size of the buffer being bounced is not checked if it happens
    to be larger than the size of the mapped buffer. Because the size
    can be controlled by a device, as it's the case with virtio devices,
    this can lead to memory corruption.
    

> 
> For double insurance, to protect from a malicous device, when DMA API
> is used for the device, this series store and use the descriptor
> metadata in an auxiliay structure which can not be accessed via
> swiotlb instead of the ones in the descriptor table. Actually, we've

Sorry for being dense here, but how wold SWIOTLB be utilized for
this attack?

> almost achieved that through packed virtqueue and we just need to fix
> a corner case of handling mapping errors. For split virtqueue we just
> follow what's done in the packed.
> 
> Note that we don't duplicate descriptor medata for indirect
> descriptors since it uses stream mapping which is read only so it's
> safe if the metadata of non-indirect descriptors are correct.
> 
> The behaivor for non DMA API is kept for minimizing the performance
> impact.
> 
> Slightly tested with packed on/off, iommu on/of, swiotlb force/off in
> the guest.
> 
> Please review.
> 
> [1] https://lore.kernel.org/netdev/fab615ce-5e13-a3b3-3715-a4203b4ab010@redhat.com/T/
> [2] https://yhbt.net/lore/all/c3629a27-3590-1d9f-211b-c0b7be152b32@redhat.com/T/#mc6b6e2343cbeffca68ca7a97e0f473aaa871c95b
> 
> Jason Wang (7):
>   virtio-ring: maintain next in extra state for packed virtqueue
>   virtio_ring: rename vring_desc_extra_packed
>   virtio-ring: factor out desc_extra allocation
>   virtio_ring: secure handling of mapping errors
>   virtio_ring: introduce virtqueue_desc_add_split()
>   virtio: use err label in __vring_new_virtqueue()
>   virtio-ring: store DMA metadata in desc_extra for split virtqueue
> 
>  drivers/virtio/virtio_ring.c | 189 ++++++++++++++++++++++++++---------
>  1 file changed, 141 insertions(+), 48 deletions(-)
> 
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 0/7] Untrusted device support for virtio
  2021-04-28 21:06 ` Konrad Rzeszutek Wilk
@ 2021-04-29  4:16   ` Jason Wang
  2021-06-04 15:17     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Wang @ 2021-04-29  4:16 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: mst, virtualization, linux-kernel, xieyongji, stefanha, file,
	ashish.kalra, martin.radev, kvm


在 2021/4/29 上午5:06, Konrad Rzeszutek Wilk 写道:
> On Wed, Apr 21, 2021 at 11:21:10AM +0800, Jason Wang wrote:
>> Hi All:
>>
>> Sometimes, the driver doesn't trust the device. This is usually
>> happens for the encrtpyed VM or VDUSE[1]. In both cases, technology
>> like swiotlb is used to prevent the poking/mangling of memory from the
>> device. But this is not sufficient since current virtio driver may
>> trust what is stored in the descriptor table (coherent mapping) for
>> performing the DMA operations like unmap and bounce so the device may
>> choose to utilize the behaviour of swiotlb to perform attacks[2].
> We fixed it in the SWIOTLB. That is it saves the expected length
> of the DMA operation. See
>
> commit daf9514fd5eb098d7d6f3a1247cb8cc48fc94155
> Author: Martin Radev <martin.b.radev@gmail.com>
> Date:   Tue Jan 12 16:07:29 2021 +0100
>
>      swiotlb: Validate bounce size in the sync/unmap path
>      
>      The size of the buffer being bounced is not checked if it happens
>      to be larger than the size of the mapped buffer. Because the size
>      can be controlled by a device, as it's the case with virtio devices,
>      this can lead to memory corruption.
>      


Good to know this, but this series tries to protect at different level. 
And I believe such protection needs to be done at both levels.


>> For double insurance, to protect from a malicous device, when DMA API
>> is used for the device, this series store and use the descriptor
>> metadata in an auxiliay structure which can not be accessed via
>> swiotlb instead of the ones in the descriptor table. Actually, we've
> Sorry for being dense here, but how wold SWIOTLB be utilized for
> this attack?


So we still behaviors that is triggered by device that is not trusted. 
Such behavior is what the series tries to avoid. We've learnt a lot of 
lessons to eliminate the potential attacks via this. And it would be too 
late to fix if we found another issue of SWIOTLB.

Proving "the unexpected device triggered behavior is safe" is very hard 
(or even impossible) than "eliminating the unexpected device triggered 
behavior totally".

E.g I wonder whether something like this can happen: Consider the DMA 
direction of unmap is under the control of device. The device can cheat 
the SWIOTLB by changing the flag to modify the device read only buffer. 
If yes, it is really safe?

The above patch only log the bounce size but it doesn't log the flag. 
Even if it logs the flag, SWIOTLB still doesn't know how each buffer is 
used and when it's the appropriate(safe) time to unmap the buffer, only 
the driver that is using the SWIOTLB know them.

So I think we need to consolidate on both layers instead of solely 
depending on the SWIOTLB.

Thanks


>
>> almost achieved that through packed virtqueue and we just need to fix
>> a corner case of handling mapping errors. For split virtqueue we just
>> follow what's done in the packed.
>>
>> Note that we don't duplicate descriptor medata for indirect
>> descriptors since it uses stream mapping which is read only so it's
>> safe if the metadata of non-indirect descriptors are correct.
>>
>> The behaivor for non DMA API is kept for minimizing the performance
>> impact.
>>
>> Slightly tested with packed on/off, iommu on/of, swiotlb force/off in
>> the guest.
>>
>> Please review.
>>
>> [1] https://lore.kernel.org/netdev/fab615ce-5e13-a3b3-3715-a4203b4ab010@redhat.com/T/
>> [2] https://yhbt.net/lore/all/c3629a27-3590-1d9f-211b-c0b7be152b32@redhat.com/T/#mc6b6e2343cbeffca68ca7a97e0f473aaa871c95b
>>
>> Jason Wang (7):
>>    virtio-ring: maintain next in extra state for packed virtqueue
>>    virtio_ring: rename vring_desc_extra_packed
>>    virtio-ring: factor out desc_extra allocation
>>    virtio_ring: secure handling of mapping errors
>>    virtio_ring: introduce virtqueue_desc_add_split()
>>    virtio: use err label in __vring_new_virtqueue()
>>    virtio-ring: store DMA metadata in desc_extra for split virtqueue
>>
>>   drivers/virtio/virtio_ring.c | 189 ++++++++++++++++++++++++++---------
>>   1 file changed, 141 insertions(+), 48 deletions(-)
>>
>> -- 
>> 2.25.1
>>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 0/7] Untrusted device support for virtio
  2021-04-29  4:16   ` Jason Wang
@ 2021-06-04 15:17     ` Konrad Rzeszutek Wilk
  2021-06-07  2:46       ` Jason Wang
  0 siblings, 1 reply; 16+ messages in thread
From: Konrad Rzeszutek Wilk @ 2021-06-04 15:17 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, virtualization, linux-kernel, xieyongji, stefanha, file,
	ashish.kalra, martin.radev, kvm

On 4/29/21 12:16 AM, Jason Wang wrote:
> 
> 在 2021/4/29 上午5:06, Konrad Rzeszutek Wilk 写道:
>> On Wed, Apr 21, 2021 at 11:21:10AM +0800, Jason Wang wrote:
>>> Hi All:
>>>
>>> Sometimes, the driver doesn't trust the device. This is usually
>>> happens for the encrtpyed VM or VDUSE[1]. In both cases, technology
>>> like swiotlb is used to prevent the poking/mangling of memory from the
>>> device. But this is not sufficient since current virtio driver may
>>> trust what is stored in the descriptor table (coherent mapping) for
>>> performing the DMA operations like unmap and bounce so the device may
>>> choose to utilize the behaviour of swiotlb to perform attacks[2].
>> We fixed it in the SWIOTLB. That is it saves the expected length
>> of the DMA operation. See
>>
>> commit daf9514fd5eb098d7d6f3a1247cb8cc48fc94155
>> Author: Martin Radev <martin.b.radev@gmail.com>
>> Date:   Tue Jan 12 16:07:29 2021 +0100
>>
>>      swiotlb: Validate bounce size in the sync/unmap path
>>      The size of the buffer being bounced is not checked if it happens
>>      to be larger than the size of the mapped buffer. Because the size
>>      can be controlled by a device, as it's the case with virtio devices,
>>      this can lead to memory corruption.
> 
> 
> Good to know this, but this series tries to protect at different level. 
> And I believe such protection needs to be done at both levels.
> 

My apologies for taking so long to respond, somehow this disappeared in 
one of the folders.
> 
>>> For double insurance, to protect from a malicous device, when DMA API
>>> is used for the device, this series store and use the descriptor
>>> metadata in an auxiliay structure which can not be accessed via
>>> swiotlb instead of the ones in the descriptor table. Actually, we've
>> Sorry for being dense here, but how wold SWIOTLB be utilized for
>> this attack?
> 
> 
> So we still behaviors that is triggered by device that is not trusted. 
> Such behavior is what the series tries to avoid. We've learnt a lot of 
> lessons to eliminate the potential attacks via this. And it would be too 
> late to fix if we found another issue of SWIOTLB.
> 
> Proving "the unexpected device triggered behavior is safe" is very hard 
> (or even impossible) than "eliminating the unexpected device triggered 
> behavior totally".
> 
> E.g I wonder whether something like this can happen: Consider the DMA 
> direction of unmap is under the control of device. The device can cheat 
> the SWIOTLB by changing the flag to modify the device read only buffer. 

<blinks> Why would you want to expose that to the device? And wouldn't 
that be specific to Linux devices - because surely Windows DMA APIs are 
different and this 'flag' seems very Linux-kernel specific?

> If yes, it is really safe?

Well no? But neither is rm -Rf / but we still allow folks to do that.
> 
> The above patch only log the bounce size but it doesn't log the flag. 

It logs and panics the system.

> Even if it logs the flag, SWIOTLB still doesn't know how each buffer is 
> used and when it's the appropriate(safe) time to unmap the buffer, only 
> the driver that is using the SWIOTLB know them.

Fair enough. Is the intent to do the same thing for all the other 
drivers that could be running in an encrypted guest and would require 
SWIOTLB.

Like legacy devices that KVM can expose (floppy driver?, SVGA driver)?

> 
> So I think we need to consolidate on both layers instead of solely 
> depending on the SWIOTLB.

Please make sure that this explanation is in part of the cover letter
or in the commit/Kconfig.

Also, are you aware of the patchset than Andi been working on that tries 
to make the DMA code to have extra bells and whistles for this purpose?

Thank you.
> Thanks
> 
> 
>>
>>> almost achieved that through packed virtqueue and we just need to fix
>>> a corner case of handling mapping errors. For split virtqueue we just
>>> follow what's done in the packed.
>>>
>>> Note that we don't duplicate descriptor medata for indirect
>>> descriptors since it uses stream mapping which is read only so it's
>>> safe if the metadata of non-indirect descriptors are correct.
>>>
>>> The behaivor for non DMA API is kept for minimizing the performance
>>> impact.
>>>
>>> Slightly tested with packed on/off, iommu on/of, swiotlb force/off in
>>> the guest.
>>>
>>> Please review.
>>>
>>> [1] 
>>> https://lore.kernel.org/netdev/fab615ce-5e13-a3b3-3715-a4203b4ab010@redhat.com/T/ 
>>>
>>> [2] 
>>> https://yhbt.net/lore/all/c3629a27-3590-1d9f-211b-c0b7be152b32@redhat.com/T/#mc6b6e2343cbeffca68ca7a97e0f473aaa871c95b 
>>>
>>>
>>> Jason Wang (7):
>>>    virtio-ring: maintain next in extra state for packed virtqueue
>>>    virtio_ring: rename vring_desc_extra_packed
>>>    virtio-ring: factor out desc_extra allocation
>>>    virtio_ring: secure handling of mapping errors
>>>    virtio_ring: introduce virtqueue_desc_add_split()
>>>    virtio: use err label in __vring_new_virtqueue()
>>>    virtio-ring: store DMA metadata in desc_extra for split virtqueue
>>>
>>>   drivers/virtio/virtio_ring.c | 189 ++++++++++++++++++++++++++---------
>>>   1 file changed, 141 insertions(+), 48 deletions(-)
>>>
>>> -- 
>>> 2.25.1
>>>
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 0/7] Untrusted device support for virtio
  2021-06-04 15:17     ` Konrad Rzeszutek Wilk
@ 2021-06-07  2:46       ` Jason Wang
  0 siblings, 0 replies; 16+ messages in thread
From: Jason Wang @ 2021-06-07  2:46 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: mst, virtualization, linux-kernel, xieyongji, stefanha, file,
	ashish.kalra, martin.radev, kvm


在 2021/6/4 下午11:17, Konrad Rzeszutek Wilk 写道:
> On 4/29/21 12:16 AM, Jason Wang wrote:
>>
>> 在 2021/4/29 上午5:06, Konrad Rzeszutek Wilk 写道:
>>> On Wed, Apr 21, 2021 at 11:21:10AM +0800, Jason Wang wrote:
>>>> Hi All:
>>>>
>>>> Sometimes, the driver doesn't trust the device. This is usually
>>>> happens for the encrtpyed VM or VDUSE[1]. In both cases, technology
>>>> like swiotlb is used to prevent the poking/mangling of memory from the
>>>> device. But this is not sufficient since current virtio driver may
>>>> trust what is stored in the descriptor table (coherent mapping) for
>>>> performing the DMA operations like unmap and bounce so the device may
>>>> choose to utilize the behaviour of swiotlb to perform attacks[2].
>>> We fixed it in the SWIOTLB. That is it saves the expected length
>>> of the DMA operation. See
>>>
>>> commit daf9514fd5eb098d7d6f3a1247cb8cc48fc94155
>>> Author: Martin Radev <martin.b.radev@gmail.com>
>>> Date:   Tue Jan 12 16:07:29 2021 +0100
>>>
>>>      swiotlb: Validate bounce size in the sync/unmap path
>>>      The size of the buffer being bounced is not checked if it happens
>>>      to be larger than the size of the mapped buffer. Because the size
>>>      can be controlled by a device, as it's the case with virtio 
>>> devices,
>>>      this can lead to memory corruption.
>>
>>
>> Good to know this, but this series tries to protect at different 
>> level. And I believe such protection needs to be done at both levels.
>>
>
> My apologies for taking so long to respond, somehow this disappeared 
> in one of the folders.


No problem.


>>
>>>> For double insurance, to protect from a malicous device, when DMA API
>>>> is used for the device, this series store and use the descriptor
>>>> metadata in an auxiliay structure which can not be accessed via
>>>> swiotlb instead of the ones in the descriptor table. Actually, we've
>>> Sorry for being dense here, but how wold SWIOTLB be utilized for
>>> this attack?
>>
>>
>> So we still behaviors that is triggered by device that is not 
>> trusted. Such behavior is what the series tries to avoid. We've 
>> learnt a lot of lessons to eliminate the potential attacks via this. 
>> And it would be too late to fix if we found another issue of SWIOTLB.
>>
>> Proving "the unexpected device triggered behavior is safe" is very 
>> hard (or even impossible) than "eliminating the unexpected device 
>> triggered behavior totally".
>>
>> E.g I wonder whether something like this can happen: Consider the DMA 
>> direction of unmap is under the control of device. The device can 
>> cheat the SWIOTLB by changing the flag to modify the device read only 
>> buffer. 
>
> <blinks> Why would you want to expose that to the device? And wouldn't 
> that be specific to Linux devices - because surely Windows DMA APIs 
> are different and this 'flag' seems very Linux-kernel specific?


Just to make sure we are in the same page. The "flag" I actually mean 
the virtio descriptor flag which could be modified by the device. And 
driver deduce the DMA API flag from the descriptor flag.


>
>> If yes, it is really safe?
>
> Well no? But neither is rm -Rf / but we still allow folks to do that.
>>
>> The above patch only log the bounce size but it doesn't log the flag. 
>
> It logs and panics the system.


Good to know that.


>
>> Even if it logs the flag, SWIOTLB still doesn't know how each buffer 
>> is used and when it's the appropriate(safe) time to unmap the buffer, 
>> only the driver that is using the SWIOTLB know them.
>
> Fair enough. Is the intent to do the same thing for all the other 
> drivers that could be running in an encrypted guest and would require 
> SWIOTLB.
>
> Like legacy devices that KVM can expose (floppy driver?, SVGA driver)?


My understanding is that we shouldn't enable the legacy devices at all 
in this case.

Note that virtio has been extended to various types of devices (we can 
boot qemu without PCI and legacy devices (e.g the micro VM))

- virtio input
- virtio gpu
- virtio sound
...

I'm not sure whether we need floppy, but it's not hard to have a 
virtio-floppy if necessary

So it would be sufficient for us to audit/harden the virtio drivers.


>
>>
>> So I think we need to consolidate on both layers instead of solely 
>> depending on the SWIOTLB.
>
> Please make sure that this explanation is in part of the cover letter
> or in the commit/Kconfig.


I will do that if the series needs a respin.


>
> Also, are you aware of the patchset than Andi been working on that 
> tries to make the DMA code to have extra bells and whistles for this 
> purpose?


Yes, but as described above they are not duplicated. Protection at both 
levels would be optimal.

Another note is that this series is not only for DMA/swiotlb stuffs, it 
eliminate all the possible attacks via the descriptor ring.

(One example is the attack via descriptor.next)

Thanks


>
> Thank you.
>> Thanks
>>
>>
>>>
>>>> almost achieved that through packed virtqueue and we just need to fix
>>>> a corner case of handling mapping errors. For split virtqueue we just
>>>> follow what's done in the packed.
>>>>
>>>> Note that we don't duplicate descriptor medata for indirect
>>>> descriptors since it uses stream mapping which is read only so it's
>>>> safe if the metadata of non-indirect descriptors are correct.
>>>>
>>>> The behaivor for non DMA API is kept for minimizing the performance
>>>> impact.
>>>>
>>>> Slightly tested with packed on/off, iommu on/of, swiotlb force/off in
>>>> the guest.
>>>>
>>>> Please review.
>>>>
>>>> [1] 
>>>> https://lore.kernel.org/netdev/fab615ce-5e13-a3b3-3715-a4203b4ab010@redhat.com/T/ 
>>>>
>>>> [2] 
>>>> https://yhbt.net/lore/all/c3629a27-3590-1d9f-211b-c0b7be152b32@redhat.com/T/#mc6b6e2343cbeffca68ca7a97e0f473aaa871c95b 
>>>>
>>>>
>>>> Jason Wang (7):
>>>>    virtio-ring: maintain next in extra state for packed virtqueue
>>>>    virtio_ring: rename vring_desc_extra_packed
>>>>    virtio-ring: factor out desc_extra allocation
>>>>    virtio_ring: secure handling of mapping errors
>>>>    virtio_ring: introduce virtqueue_desc_add_split()
>>>>    virtio: use err label in __vring_new_virtqueue()
>>>>    virtio-ring: store DMA metadata in desc_extra for split virtqueue
>>>>
>>>>   drivers/virtio/virtio_ring.c | 189 
>>>> ++++++++++++++++++++++++++---------
>>>>   1 file changed, 141 insertions(+), 48 deletions(-)
>>>>
>>>> -- 
>>>> 2.25.1
>>>>
>>
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-06-07  2:47 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-21  3:21 [RFC PATCH 0/7] Untrusted device support for virtio Jason Wang
2021-04-21  3:21 ` [RFC PATCH 1/7] virtio-ring: maintain next in extra state for packed virtqueue Jason Wang
2021-04-21  3:21 ` [RFC PATCH 2/7] virtio_ring: rename vring_desc_extra_packed Jason Wang
2021-04-21  3:21 ` [RFC PATCH 3/7] virtio-ring: factor out desc_extra allocation Jason Wang
2021-04-21  3:21 ` [RFC PATCH 4/7] virtio_ring: secure handling of mapping errors Jason Wang
2021-04-21  3:21 ` [RFC PATCH 5/7] virtio_ring: introduce virtqueue_desc_add_split() Jason Wang
2021-04-21  3:21 ` [RFC PATCH 6/7] virtio: use err label in __vring_new_virtqueue() Jason Wang
2021-04-21  3:21 ` [RFC PATCH 7/7] virtio-ring: store DMA metadata in desc_extra for split virtqueue Jason Wang
2021-04-22  6:31 ` [RFC PATCH 0/7] Untrusted device support for virtio Christoph Hellwig
2021-04-22  8:19   ` Jason Wang
2021-04-23 20:14     ` Michael S. Tsirkin
2021-04-25  1:43       ` Jason Wang
2021-04-28 21:06 ` Konrad Rzeszutek Wilk
2021-04-29  4:16   ` Jason Wang
2021-06-04 15:17     ` Konrad Rzeszutek Wilk
2021-06-07  2:46       ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).