All of lore.kernel.org
 help / color / mirror / Atom feed
* Virtio hardening for TDX
@ 2021-06-03  0:41 ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  0:41 UTC (permalink / raw)
  To: mst
  Cc: jasowang, virtualization, hch, m.szyprowski, robin.murphy, iommu,
	x86, sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel

[v1: Initial post]

With confidential computing like TDX the guest doesn't trust the host
anymore. The host is allowed to DOS of course, but it is not allowed
to read or write any guest memory not explicitely shared with it.

This has implication for virtio. Traditionally virtio didn't assume
the other side of the communication channel is malicious, and therefore
didn't do any boundary checks in virtio ring data structures.

This patchkit does hardening for virtio.  In a TDX like model
the only host memory accesses allowed are in the virtio ring, 
as well as the (forced) swiotlb buffer.

This patch kit does various changes to ensure there can be no
access outside these two areas. It is possible for the host
to break the communication, but this should result in a IO
error on the guest, but no memory safety violations.

virtio is quite complicated with many modes. To simplify
the task we enforce that virtio is only in split mode without
indirect descriptors, when running as a TDX guest. We also
enforce use of the DMA API.

Then these code paths are hardened against any corruptions
on the ring.

This patchkit has components in three subsystems:
- Hardening changes to virtio, all in the generic virtio-ring
- Hardening changes to kernel/dma swiotlb to harden swiotlb against
malicious pointers. It requires an API change which needed a tree sweep.
- A single x86 patch to enable the arch_has_restricted_memory_access
for TDX

It depends on Sathya's earlier patchkit that adds the basic infrastructure
for TDX. This is only needed for the "am I running in TDX" part.




^ permalink raw reply	[flat|nested] 116+ messages in thread

* Virtio hardening for TDX
@ 2021-06-03  0:41 ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  0:41 UTC (permalink / raw)
  To: mst
  Cc: jasowang, x86, linux-kernel, virtualization, iommu, jpoimboe,
	robin.murphy, hch

[v1: Initial post]

With confidential computing like TDX the guest doesn't trust the host
anymore. The host is allowed to DOS of course, but it is not allowed
to read or write any guest memory not explicitely shared with it.

This has implication for virtio. Traditionally virtio didn't assume
the other side of the communication channel is malicious, and therefore
didn't do any boundary checks in virtio ring data structures.

This patchkit does hardening for virtio.  In a TDX like model
the only host memory accesses allowed are in the virtio ring, 
as well as the (forced) swiotlb buffer.

This patch kit does various changes to ensure there can be no
access outside these two areas. It is possible for the host
to break the communication, but this should result in a IO
error on the guest, but no memory safety violations.

virtio is quite complicated with many modes. To simplify
the task we enforce that virtio is only in split mode without
indirect descriptors, when running as a TDX guest. We also
enforce use of the DMA API.

Then these code paths are hardened against any corruptions
on the ring.

This patchkit has components in three subsystems:
- Hardening changes to virtio, all in the generic virtio-ring
- Hardening changes to kernel/dma swiotlb to harden swiotlb against
malicious pointers. It requires an API change which needed a tree sweep.
- A single x86 patch to enable the arch_has_restricted_memory_access
for TDX

It depends on Sathya's earlier patchkit that adds the basic infrastructure
for TDX. This is only needed for the "am I running in TDX" part.



_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Virtio hardening for TDX
@ 2021-06-03  0:41 ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  0:41 UTC (permalink / raw)
  To: mst
  Cc: sathyanarayanan.kuppuswamy, x86, linux-kernel, virtualization,
	iommu, jpoimboe, robin.murphy, hch, m.szyprowski

[v1: Initial post]

With confidential computing like TDX the guest doesn't trust the host
anymore. The host is allowed to DOS of course, but it is not allowed
to read or write any guest memory not explicitely shared with it.

This has implication for virtio. Traditionally virtio didn't assume
the other side of the communication channel is malicious, and therefore
didn't do any boundary checks in virtio ring data structures.

This patchkit does hardening for virtio.  In a TDX like model
the only host memory accesses allowed are in the virtio ring, 
as well as the (forced) swiotlb buffer.

This patch kit does various changes to ensure there can be no
access outside these two areas. It is possible for the host
to break the communication, but this should result in a IO
error on the guest, but no memory safety violations.

virtio is quite complicated with many modes. To simplify
the task we enforce that virtio is only in split mode without
indirect descriptors, when running as a TDX guest. We also
enforce use of the DMA API.

Then these code paths are hardened against any corruptions
on the ring.

This patchkit has components in three subsystems:
- Hardening changes to virtio, all in the generic virtio-ring
- Hardening changes to kernel/dma swiotlb to harden swiotlb against
malicious pointers. It requires an API change which needed a tree sweep.
- A single x86 patch to enable the arch_has_restricted_memory_access
for TDX

It depends on Sathya's earlier patchkit that adds the basic infrastructure
for TDX. This is only needed for the "am I running in TDX" part.



_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* [PATCH v1 1/8] virtio: Force only split mode with protected guest
  2021-06-03  0:41 ` Andi Kleen
  (?)
@ 2021-06-03  0:41   ` Andi Kleen
  -1 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  0:41 UTC (permalink / raw)
  To: mst
  Cc: jasowang, virtualization, hch, m.szyprowski, robin.murphy, iommu,
	x86, sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel,
	Andi Kleen

When running under TDX the virtio host is untrusted. The bulk
of the kernel memory is encrypted and protected, but the virtio
ring is in special shared memory that is shared with the
untrusted host.

This means virtio needs to be hardened against any attacks from
the host through the ring. Of course it's impossible to prevent DOS
(the host can chose at any time to stop doing IO), but there
should be no buffer overruns or similar that might give access to
any private memory in the guest.

virtio has a lot of modes, most are difficult to harden.

The best for hardening seems to be split mode without indirect
descriptors. This also simplifies the hardening job because
it's only a single code path.

Only allow split mode when in a protected guest. Followon
patches harden the split mode code paths, and we don't want
an malicious host to force anything else. Also disallow
indirect mode for similar reasons.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 drivers/virtio/virtio_ring.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 71e16b53e9c1..f35629fa47b1 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -11,6 +11,7 @@
 #include <linux/module.h>
 #include <linux/hrtimer.h>
 #include <linux/dma-mapping.h>
+#include <linux/protected_guest.h>
 #include <xen/xen.h>
 
 #ifdef DEBUG
@@ -2221,8 +2222,16 @@ void vring_transport_features(struct virtio_device *vdev)
 	unsigned int i;
 
 	for (i = VIRTIO_TRANSPORT_F_START; i < VIRTIO_TRANSPORT_F_END; i++) {
+
+		/*
+		 * In protected guest mode disallow packed or indirect
+		 * because they ain't hardened.
+		 */
+
 		switch (i) {
 		case VIRTIO_RING_F_INDIRECT_DESC:
+			if (protected_guest_has(VM_MEM_ENCRYPT))
+				goto clear;
 			break;
 		case VIRTIO_RING_F_EVENT_IDX:
 			break;
@@ -2231,9 +2240,12 @@ void vring_transport_features(struct virtio_device *vdev)
 		case VIRTIO_F_ACCESS_PLATFORM:
 			break;
 		case VIRTIO_F_RING_PACKED:
+			if (protected_guest_has(VM_MEM_ENCRYPT))
+				goto clear;
 			break;
 		case VIRTIO_F_ORDER_PLATFORM:
 			break;
+		clear:
 		default:
 			/* We don't understand this bit. */
 			__virtio_clear_bit(vdev, i);
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-03  0:41   ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  0:41 UTC (permalink / raw)
  To: mst
  Cc: Andi Kleen, jasowang, x86, linux-kernel, virtualization, iommu,
	jpoimboe, robin.murphy, hch

When running under TDX the virtio host is untrusted. The bulk
of the kernel memory is encrypted and protected, but the virtio
ring is in special shared memory that is shared with the
untrusted host.

This means virtio needs to be hardened against any attacks from
the host through the ring. Of course it's impossible to prevent DOS
(the host can chose at any time to stop doing IO), but there
should be no buffer overruns or similar that might give access to
any private memory in the guest.

virtio has a lot of modes, most are difficult to harden.

The best for hardening seems to be split mode without indirect
descriptors. This also simplifies the hardening job because
it's only a single code path.

Only allow split mode when in a protected guest. Followon
patches harden the split mode code paths, and we don't want
an malicious host to force anything else. Also disallow
indirect mode for similar reasons.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 drivers/virtio/virtio_ring.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 71e16b53e9c1..f35629fa47b1 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -11,6 +11,7 @@
 #include <linux/module.h>
 #include <linux/hrtimer.h>
 #include <linux/dma-mapping.h>
+#include <linux/protected_guest.h>
 #include <xen/xen.h>
 
 #ifdef DEBUG
@@ -2221,8 +2222,16 @@ void vring_transport_features(struct virtio_device *vdev)
 	unsigned int i;
 
 	for (i = VIRTIO_TRANSPORT_F_START; i < VIRTIO_TRANSPORT_F_END; i++) {
+
+		/*
+		 * In protected guest mode disallow packed or indirect
+		 * because they ain't hardened.
+		 */
+
 		switch (i) {
 		case VIRTIO_RING_F_INDIRECT_DESC:
+			if (protected_guest_has(VM_MEM_ENCRYPT))
+				goto clear;
 			break;
 		case VIRTIO_RING_F_EVENT_IDX:
 			break;
@@ -2231,9 +2240,12 @@ void vring_transport_features(struct virtio_device *vdev)
 		case VIRTIO_F_ACCESS_PLATFORM:
 			break;
 		case VIRTIO_F_RING_PACKED:
+			if (protected_guest_has(VM_MEM_ENCRYPT))
+				goto clear;
 			break;
 		case VIRTIO_F_ORDER_PLATFORM:
 			break;
+		clear:
 		default:
 			/* We don't understand this bit. */
 			__virtio_clear_bit(vdev, i);
-- 
2.25.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-03  0:41   ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  0:41 UTC (permalink / raw)
  To: mst
  Cc: sathyanarayanan.kuppuswamy, Andi Kleen, x86, linux-kernel,
	virtualization, iommu, jpoimboe, robin.murphy, hch, m.szyprowski

When running under TDX the virtio host is untrusted. The bulk
of the kernel memory is encrypted and protected, but the virtio
ring is in special shared memory that is shared with the
untrusted host.

This means virtio needs to be hardened against any attacks from
the host through the ring. Of course it's impossible to prevent DOS
(the host can chose at any time to stop doing IO), but there
should be no buffer overruns or similar that might give access to
any private memory in the guest.

virtio has a lot of modes, most are difficult to harden.

The best for hardening seems to be split mode without indirect
descriptors. This also simplifies the hardening job because
it's only a single code path.

Only allow split mode when in a protected guest. Followon
patches harden the split mode code paths, and we don't want
an malicious host to force anything else. Also disallow
indirect mode for similar reasons.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 drivers/virtio/virtio_ring.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 71e16b53e9c1..f35629fa47b1 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -11,6 +11,7 @@
 #include <linux/module.h>
 #include <linux/hrtimer.h>
 #include <linux/dma-mapping.h>
+#include <linux/protected_guest.h>
 #include <xen/xen.h>
 
 #ifdef DEBUG
@@ -2221,8 +2222,16 @@ void vring_transport_features(struct virtio_device *vdev)
 	unsigned int i;
 
 	for (i = VIRTIO_TRANSPORT_F_START; i < VIRTIO_TRANSPORT_F_END; i++) {
+
+		/*
+		 * In protected guest mode disallow packed or indirect
+		 * because they ain't hardened.
+		 */
+
 		switch (i) {
 		case VIRTIO_RING_F_INDIRECT_DESC:
+			if (protected_guest_has(VM_MEM_ENCRYPT))
+				goto clear;
 			break;
 		case VIRTIO_RING_F_EVENT_IDX:
 			break;
@@ -2231,9 +2240,12 @@ void vring_transport_features(struct virtio_device *vdev)
 		case VIRTIO_F_ACCESS_PLATFORM:
 			break;
 		case VIRTIO_F_RING_PACKED:
+			if (protected_guest_has(VM_MEM_ENCRYPT))
+				goto clear;
 			break;
 		case VIRTIO_F_ORDER_PLATFORM:
 			break;
+		clear:
 		default:
 			/* We don't understand this bit. */
 			__virtio_clear_bit(vdev, i);
-- 
2.25.4

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v1 2/8] virtio: Add boundary checks to virtio ring
  2021-06-03  0:41 ` Andi Kleen
  (?)
@ 2021-06-03  0:41   ` Andi Kleen
  -1 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  0:41 UTC (permalink / raw)
  To: mst
  Cc: jasowang, virtualization, hch, m.szyprowski, robin.murphy, iommu,
	x86, sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel,
	Andi Kleen

In protected guest mode we don't trust the host.

This means we need to make sure the host cannot subvert us through
virtio communication. In general it can corrupt our virtio data
and cause a DOS, but it should not be able to access any data
that is not explicitely under IO.

Also boundary checking so that the free list (which is accessible
to the host) cannot point outside the virtio ring. Note it could
still contain loops or similar, but these should only cause an DOS,
not a memory corruption or leak.

When we detect any out of bounds descriptor trigger an IO error.
We also use a WARN() (in case it was a software bug instead of
an attack). This implies that a malicious host can flood
the guest kernel log, but that's only a DOS and acceptable
in the threat model.

This patch only hardens the initial consumption of the free list,
the freeing comes later.

Any of these errors can cause DMA memory leaks, but there is nothing
we can do about that and that would be just a DOS.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 drivers/virtio/virtio_ring.c | 46 ++++++++++++++++++++++++++++++++----
 1 file changed, 42 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index f35629fa47b1..d37ff5a0ff58 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -413,6 +413,15 @@ static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
 	return desc;
 }
 
+/* assumes no indirect mode */
+static inline bool inside_split_ring(struct vring_virtqueue *vq,
+				     unsigned index)
+{
+	return !WARN(index >= vq->split.vring.num,
+		    "desc index %u out of bounds (%u)\n",
+		    index, vq->split.vring.num);
+}
+
 static inline int virtqueue_add_split(struct virtqueue *_vq,
 				      struct scatterlist *sgs[],
 				      unsigned int total_sg,
@@ -428,6 +437,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	unsigned int i, n, avail, descs_used, prev, err_idx;
 	int head;
 	bool indirect;
+	int io_err;
 
 	START_USE(vq);
 
@@ -481,7 +491,13 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 
 	for (n = 0; n < out_sgs; n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
+			dma_addr_t addr;
+
+			io_err = -EIO;
+			if (!inside_split_ring(vq, i))
+				goto unmap_release;
+			io_err = -ENOMEM;
+			addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
 			if (vring_mapping_error(vq, addr))
 				goto unmap_release;
 
@@ -494,7 +510,13 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	}
 	for (; n < (out_sgs + in_sgs); n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
+			dma_addr_t addr;
+
+			io_err = -EIO;
+			if (!inside_split_ring(vq, i))
+				goto unmap_release;
+			io_err = -ENOMEM;
+			addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
 			if (vring_mapping_error(vq, addr))
 				goto unmap_release;
 
@@ -513,6 +535,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 		dma_addr_t addr = vring_map_single(
 			vq, desc, total_sg * sizeof(struct vring_desc),
 			DMA_TO_DEVICE);
+		io_err = -ENOMEM;
 		if (vring_mapping_error(vq, addr))
 			goto unmap_release;
 
@@ -528,6 +551,10 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	/* We're using some buffers from the free list. */
 	vq->vq.num_free -= descs_used;
 
+	io_err = -EIO;
+	if (!inside_split_ring(vq, head))
+		goto unmap_release;
+
 	/* Update free pointer */
 	if (indirect)
 		vq->free_head = virtio16_to_cpu(_vq->vdev,
@@ -545,6 +572,10 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	/* Put entry in available array (but don't update avail->idx until they
 	 * do sync). */
 	avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
+
+	if (avail >= vq->split.vring.num)
+		goto unmap_release;
+
 	vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);
 
 	/* Descriptors and available array need to be set before we expose the
@@ -576,6 +607,8 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	for (n = 0; n < total_sg; n++) {
 		if (i == err_idx)
 			break;
+		if (!inside_split_ring(vq, i))
+			break;
 		vring_unmap_one_split(vq, &desc[i]);
 		i = virtio16_to_cpu(_vq->vdev, desc[i].next);
 	}
@@ -584,7 +617,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 		kfree(desc);
 
 	END_USE(vq);
-	return -ENOMEM;
+	return io_err;
 }
 
 static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
@@ -1146,7 +1179,12 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 	c = 0;
 	for (n = 0; n < out_sgs + in_sgs; n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
+			dma_addr_t addr;
+
+			if (curr >= vq->packed.vring.num)
+				goto unmap_release;
+
+			addr = vring_map_one_sg(vq, sg, n < out_sgs ?
 					DMA_TO_DEVICE : DMA_FROM_DEVICE);
 			if (vring_mapping_error(vq, addr))
 				goto unmap_release;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v1 2/8] virtio: Add boundary checks to virtio ring
@ 2021-06-03  0:41   ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  0:41 UTC (permalink / raw)
  To: mst
  Cc: Andi Kleen, jasowang, x86, linux-kernel, virtualization, iommu,
	jpoimboe, robin.murphy, hch

In protected guest mode we don't trust the host.

This means we need to make sure the host cannot subvert us through
virtio communication. In general it can corrupt our virtio data
and cause a DOS, but it should not be able to access any data
that is not explicitely under IO.

Also boundary checking so that the free list (which is accessible
to the host) cannot point outside the virtio ring. Note it could
still contain loops or similar, but these should only cause an DOS,
not a memory corruption or leak.

When we detect any out of bounds descriptor trigger an IO error.
We also use a WARN() (in case it was a software bug instead of
an attack). This implies that a malicious host can flood
the guest kernel log, but that's only a DOS and acceptable
in the threat model.

This patch only hardens the initial consumption of the free list,
the freeing comes later.

Any of these errors can cause DMA memory leaks, but there is nothing
we can do about that and that would be just a DOS.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 drivers/virtio/virtio_ring.c | 46 ++++++++++++++++++++++++++++++++----
 1 file changed, 42 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index f35629fa47b1..d37ff5a0ff58 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -413,6 +413,15 @@ static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
 	return desc;
 }
 
+/* assumes no indirect mode */
+static inline bool inside_split_ring(struct vring_virtqueue *vq,
+				     unsigned index)
+{
+	return !WARN(index >= vq->split.vring.num,
+		    "desc index %u out of bounds (%u)\n",
+		    index, vq->split.vring.num);
+}
+
 static inline int virtqueue_add_split(struct virtqueue *_vq,
 				      struct scatterlist *sgs[],
 				      unsigned int total_sg,
@@ -428,6 +437,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	unsigned int i, n, avail, descs_used, prev, err_idx;
 	int head;
 	bool indirect;
+	int io_err;
 
 	START_USE(vq);
 
@@ -481,7 +491,13 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 
 	for (n = 0; n < out_sgs; n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
+			dma_addr_t addr;
+
+			io_err = -EIO;
+			if (!inside_split_ring(vq, i))
+				goto unmap_release;
+			io_err = -ENOMEM;
+			addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
 			if (vring_mapping_error(vq, addr))
 				goto unmap_release;
 
@@ -494,7 +510,13 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	}
 	for (; n < (out_sgs + in_sgs); n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
+			dma_addr_t addr;
+
+			io_err = -EIO;
+			if (!inside_split_ring(vq, i))
+				goto unmap_release;
+			io_err = -ENOMEM;
+			addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
 			if (vring_mapping_error(vq, addr))
 				goto unmap_release;
 
@@ -513,6 +535,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 		dma_addr_t addr = vring_map_single(
 			vq, desc, total_sg * sizeof(struct vring_desc),
 			DMA_TO_DEVICE);
+		io_err = -ENOMEM;
 		if (vring_mapping_error(vq, addr))
 			goto unmap_release;
 
@@ -528,6 +551,10 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	/* We're using some buffers from the free list. */
 	vq->vq.num_free -= descs_used;
 
+	io_err = -EIO;
+	if (!inside_split_ring(vq, head))
+		goto unmap_release;
+
 	/* Update free pointer */
 	if (indirect)
 		vq->free_head = virtio16_to_cpu(_vq->vdev,
@@ -545,6 +572,10 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	/* Put entry in available array (but don't update avail->idx until they
 	 * do sync). */
 	avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
+
+	if (avail >= vq->split.vring.num)
+		goto unmap_release;
+
 	vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);
 
 	/* Descriptors and available array need to be set before we expose the
@@ -576,6 +607,8 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	for (n = 0; n < total_sg; n++) {
 		if (i == err_idx)
 			break;
+		if (!inside_split_ring(vq, i))
+			break;
 		vring_unmap_one_split(vq, &desc[i]);
 		i = virtio16_to_cpu(_vq->vdev, desc[i].next);
 	}
@@ -584,7 +617,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 		kfree(desc);
 
 	END_USE(vq);
-	return -ENOMEM;
+	return io_err;
 }
 
 static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
@@ -1146,7 +1179,12 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 	c = 0;
 	for (n = 0; n < out_sgs + in_sgs; n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
+			dma_addr_t addr;
+
+			if (curr >= vq->packed.vring.num)
+				goto unmap_release;
+
+			addr = vring_map_one_sg(vq, sg, n < out_sgs ?
 					DMA_TO_DEVICE : DMA_FROM_DEVICE);
 			if (vring_mapping_error(vq, addr))
 				goto unmap_release;
-- 
2.25.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v1 2/8] virtio: Add boundary checks to virtio ring
@ 2021-06-03  0:41   ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  0:41 UTC (permalink / raw)
  To: mst
  Cc: sathyanarayanan.kuppuswamy, Andi Kleen, x86, linux-kernel,
	virtualization, iommu, jpoimboe, robin.murphy, hch, m.szyprowski

In protected guest mode we don't trust the host.

This means we need to make sure the host cannot subvert us through
virtio communication. In general it can corrupt our virtio data
and cause a DOS, but it should not be able to access any data
that is not explicitely under IO.

Also boundary checking so that the free list (which is accessible
to the host) cannot point outside the virtio ring. Note it could
still contain loops or similar, but these should only cause an DOS,
not a memory corruption or leak.

When we detect any out of bounds descriptor trigger an IO error.
We also use a WARN() (in case it was a software bug instead of
an attack). This implies that a malicious host can flood
the guest kernel log, but that's only a DOS and acceptable
in the threat model.

This patch only hardens the initial consumption of the free list,
the freeing comes later.

Any of these errors can cause DMA memory leaks, but there is nothing
we can do about that and that would be just a DOS.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 drivers/virtio/virtio_ring.c | 46 ++++++++++++++++++++++++++++++++----
 1 file changed, 42 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index f35629fa47b1..d37ff5a0ff58 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -413,6 +413,15 @@ static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
 	return desc;
 }
 
+/* assumes no indirect mode */
+static inline bool inside_split_ring(struct vring_virtqueue *vq,
+				     unsigned index)
+{
+	return !WARN(index >= vq->split.vring.num,
+		    "desc index %u out of bounds (%u)\n",
+		    index, vq->split.vring.num);
+}
+
 static inline int virtqueue_add_split(struct virtqueue *_vq,
 				      struct scatterlist *sgs[],
 				      unsigned int total_sg,
@@ -428,6 +437,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	unsigned int i, n, avail, descs_used, prev, err_idx;
 	int head;
 	bool indirect;
+	int io_err;
 
 	START_USE(vq);
 
@@ -481,7 +491,13 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 
 	for (n = 0; n < out_sgs; n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
+			dma_addr_t addr;
+
+			io_err = -EIO;
+			if (!inside_split_ring(vq, i))
+				goto unmap_release;
+			io_err = -ENOMEM;
+			addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
 			if (vring_mapping_error(vq, addr))
 				goto unmap_release;
 
@@ -494,7 +510,13 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	}
 	for (; n < (out_sgs + in_sgs); n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
+			dma_addr_t addr;
+
+			io_err = -EIO;
+			if (!inside_split_ring(vq, i))
+				goto unmap_release;
+			io_err = -ENOMEM;
+			addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
 			if (vring_mapping_error(vq, addr))
 				goto unmap_release;
 
@@ -513,6 +535,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 		dma_addr_t addr = vring_map_single(
 			vq, desc, total_sg * sizeof(struct vring_desc),
 			DMA_TO_DEVICE);
+		io_err = -ENOMEM;
 		if (vring_mapping_error(vq, addr))
 			goto unmap_release;
 
@@ -528,6 +551,10 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	/* We're using some buffers from the free list. */
 	vq->vq.num_free -= descs_used;
 
+	io_err = -EIO;
+	if (!inside_split_ring(vq, head))
+		goto unmap_release;
+
 	/* Update free pointer */
 	if (indirect)
 		vq->free_head = virtio16_to_cpu(_vq->vdev,
@@ -545,6 +572,10 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	/* Put entry in available array (but don't update avail->idx until they
 	 * do sync). */
 	avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
+
+	if (avail >= vq->split.vring.num)
+		goto unmap_release;
+
 	vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);
 
 	/* Descriptors and available array need to be set before we expose the
@@ -576,6 +607,8 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	for (n = 0; n < total_sg; n++) {
 		if (i == err_idx)
 			break;
+		if (!inside_split_ring(vq, i))
+			break;
 		vring_unmap_one_split(vq, &desc[i]);
 		i = virtio16_to_cpu(_vq->vdev, desc[i].next);
 	}
@@ -584,7 +617,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 		kfree(desc);
 
 	END_USE(vq);
-	return -ENOMEM;
+	return io_err;
 }
 
 static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
@@ -1146,7 +1179,12 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 	c = 0;
 	for (n = 0; n < out_sgs + in_sgs; n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
+			dma_addr_t addr;
+
+			if (curr >= vq->packed.vring.num)
+				goto unmap_release;
+
+			addr = vring_map_one_sg(vq, sg, n < out_sgs ?
 					DMA_TO_DEVICE : DMA_FROM_DEVICE);
 			if (vring_mapping_error(vq, addr))
 				goto unmap_release;
-- 
2.25.4

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v1 3/8] virtio: Harden split buffer detachment
  2021-06-03  0:41 ` Andi Kleen
  (?)
@ 2021-06-03  0:41   ` Andi Kleen
  -1 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  0:41 UTC (permalink / raw)
  To: mst
  Cc: jasowang, virtualization, hch, m.szyprowski, robin.murphy, iommu,
	x86, sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel,
	Andi Kleen

Harden the split buffer detachment path by adding boundary checking. Note
that when this fails we may fail to unmap some swiotlb mapping, which could
result in a leak and a DOS. But that's acceptable because an malicious host
can DOS us anyways.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 drivers/virtio/virtio_ring.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index d37ff5a0ff58..1e9aa1e95e1b 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -651,12 +651,19 @@ static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
 	return needs_kick;
 }
 
-static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
-			     void **ctx)
+static int detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
+			    void **ctx)
 {
 	unsigned int i, j;
 	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
 
+	/* We'll leak DMA mappings when this happens, but nothing
+	 * can be done about that. In the worst case the host
+	 * could DOS us, but it can of course do that anyways.
+	 */
+	if (!inside_split_ring(vq, head))
+		return -EIO;
+
 	/* Clear data ptr. */
 	vq->split.desc_state[head].data = NULL;
 
@@ -666,6 +673,8 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 	while (vq->split.vring.desc[i].flags & nextflag) {
 		vring_unmap_one_split(vq, &vq->split.vring.desc[i]);
 		i = virtio16_to_cpu(vq->vq.vdev, vq->split.vring.desc[i].next);
+		if (!inside_split_ring(vq, i))
+			return -EIO;
 		vq->vq.num_free++;
 	}
 
@@ -684,7 +693,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 
 		/* Free the indirect table, if any, now that it's unmapped. */
 		if (!indir_desc)
-			return;
+			return 0;
 
 		len = virtio32_to_cpu(vq->vq.vdev,
 				vq->split.vring.desc[head].len);
@@ -701,6 +710,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 	} else if (ctx) {
 		*ctx = vq->split.desc_state[head].indir_desc;
 	}
+	return 0;
 }
 
 static inline bool more_used_split(const struct vring_virtqueue *vq)
@@ -717,6 +727,7 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
 	void *ret;
 	unsigned int i;
 	u16 last_used;
+	int err;
 
 	START_USE(vq);
 
@@ -751,7 +762,12 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
 
 	/* detach_buf_split clears data, so grab it now. */
 	ret = vq->split.desc_state[i].data;
-	detach_buf_split(vq, i, ctx);
+	err = detach_buf_split(vq, i, ctx);
+	if (err) {
+		END_USE(vq);
+		return NULL;
+	}
+
 	vq->last_used_idx++;
 	/* If we expect an interrupt for the next entry, tell host
 	 * by writing event index and flush out the write before
@@ -863,6 +879,7 @@ static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
 		/* detach_buf_split clears data, so grab it now. */
 		buf = vq->split.desc_state[i].data;
 		detach_buf_split(vq, i, NULL);
+		/* Don't need to check for error because nothing is returned */
 		vq->split.avail_idx_shadow--;
 		vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
 				vq->split.avail_idx_shadow);
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v1 3/8] virtio: Harden split buffer detachment
@ 2021-06-03  0:41   ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  0:41 UTC (permalink / raw)
  To: mst
  Cc: Andi Kleen, jasowang, x86, linux-kernel, virtualization, iommu,
	jpoimboe, robin.murphy, hch

Harden the split buffer detachment path by adding boundary checking. Note
that when this fails we may fail to unmap some swiotlb mapping, which could
result in a leak and a DOS. But that's acceptable because an malicious host
can DOS us anyways.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 drivers/virtio/virtio_ring.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index d37ff5a0ff58..1e9aa1e95e1b 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -651,12 +651,19 @@ static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
 	return needs_kick;
 }
 
-static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
-			     void **ctx)
+static int detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
+			    void **ctx)
 {
 	unsigned int i, j;
 	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
 
+	/* We'll leak DMA mappings when this happens, but nothing
+	 * can be done about that. In the worst case the host
+	 * could DOS us, but it can of course do that anyways.
+	 */
+	if (!inside_split_ring(vq, head))
+		return -EIO;
+
 	/* Clear data ptr. */
 	vq->split.desc_state[head].data = NULL;
 
@@ -666,6 +673,8 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 	while (vq->split.vring.desc[i].flags & nextflag) {
 		vring_unmap_one_split(vq, &vq->split.vring.desc[i]);
 		i = virtio16_to_cpu(vq->vq.vdev, vq->split.vring.desc[i].next);
+		if (!inside_split_ring(vq, i))
+			return -EIO;
 		vq->vq.num_free++;
 	}
 
@@ -684,7 +693,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 
 		/* Free the indirect table, if any, now that it's unmapped. */
 		if (!indir_desc)
-			return;
+			return 0;
 
 		len = virtio32_to_cpu(vq->vq.vdev,
 				vq->split.vring.desc[head].len);
@@ -701,6 +710,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 	} else if (ctx) {
 		*ctx = vq->split.desc_state[head].indir_desc;
 	}
+	return 0;
 }
 
 static inline bool more_used_split(const struct vring_virtqueue *vq)
@@ -717,6 +727,7 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
 	void *ret;
 	unsigned int i;
 	u16 last_used;
+	int err;
 
 	START_USE(vq);
 
@@ -751,7 +762,12 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
 
 	/* detach_buf_split clears data, so grab it now. */
 	ret = vq->split.desc_state[i].data;
-	detach_buf_split(vq, i, ctx);
+	err = detach_buf_split(vq, i, ctx);
+	if (err) {
+		END_USE(vq);
+		return NULL;
+	}
+
 	vq->last_used_idx++;
 	/* If we expect an interrupt for the next entry, tell host
 	 * by writing event index and flush out the write before
@@ -863,6 +879,7 @@ static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
 		/* detach_buf_split clears data, so grab it now. */
 		buf = vq->split.desc_state[i].data;
 		detach_buf_split(vq, i, NULL);
+		/* Don't need to check for error because nothing is returned */
 		vq->split.avail_idx_shadow--;
 		vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
 				vq->split.avail_idx_shadow);
-- 
2.25.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v1 3/8] virtio: Harden split buffer detachment
@ 2021-06-03  0:41   ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  0:41 UTC (permalink / raw)
  To: mst
  Cc: sathyanarayanan.kuppuswamy, Andi Kleen, x86, linux-kernel,
	virtualization, iommu, jpoimboe, robin.murphy, hch, m.szyprowski

Harden the split buffer detachment path by adding boundary checking. Note
that when this fails we may fail to unmap some swiotlb mapping, which could
result in a leak and a DOS. But that's acceptable because an malicious host
can DOS us anyways.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 drivers/virtio/virtio_ring.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index d37ff5a0ff58..1e9aa1e95e1b 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -651,12 +651,19 @@ static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
 	return needs_kick;
 }
 
-static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
-			     void **ctx)
+static int detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
+			    void **ctx)
 {
 	unsigned int i, j;
 	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
 
+	/* We'll leak DMA mappings when this happens, but nothing
+	 * can be done about that. In the worst case the host
+	 * could DOS us, but it can of course do that anyways.
+	 */
+	if (!inside_split_ring(vq, head))
+		return -EIO;
+
 	/* Clear data ptr. */
 	vq->split.desc_state[head].data = NULL;
 
@@ -666,6 +673,8 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 	while (vq->split.vring.desc[i].flags & nextflag) {
 		vring_unmap_one_split(vq, &vq->split.vring.desc[i]);
 		i = virtio16_to_cpu(vq->vq.vdev, vq->split.vring.desc[i].next);
+		if (!inside_split_ring(vq, i))
+			return -EIO;
 		vq->vq.num_free++;
 	}
 
@@ -684,7 +693,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 
 		/* Free the indirect table, if any, now that it's unmapped. */
 		if (!indir_desc)
-			return;
+			return 0;
 
 		len = virtio32_to_cpu(vq->vq.vdev,
 				vq->split.vring.desc[head].len);
@@ -701,6 +710,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 	} else if (ctx) {
 		*ctx = vq->split.desc_state[head].indir_desc;
 	}
+	return 0;
 }
 
 static inline bool more_used_split(const struct vring_virtqueue *vq)
@@ -717,6 +727,7 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
 	void *ret;
 	unsigned int i;
 	u16 last_used;
+	int err;
 
 	START_USE(vq);
 
@@ -751,7 +762,12 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
 
 	/* detach_buf_split clears data, so grab it now. */
 	ret = vq->split.desc_state[i].data;
-	detach_buf_split(vq, i, ctx);
+	err = detach_buf_split(vq, i, ctx);
+	if (err) {
+		END_USE(vq);
+		return NULL;
+	}
+
 	vq->last_used_idx++;
 	/* If we expect an interrupt for the next entry, tell host
 	 * by writing event index and flush out the write before
@@ -863,6 +879,7 @@ static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
 		/* detach_buf_split clears data, so grab it now. */
 		buf = vq->split.desc_state[i].data;
 		detach_buf_split(vq, i, NULL);
+		/* Don't need to check for error because nothing is returned */
 		vq->split.avail_idx_shadow--;
 		vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
 				vq->split.avail_idx_shadow);
-- 
2.25.4

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v1 4/8] x86/tdx: Add arch_has_restricted_memory_access for TDX
  2021-06-03  0:41 ` Andi Kleen
  (?)
@ 2021-06-03  0:41   ` Andi Kleen
  -1 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  0:41 UTC (permalink / raw)
  To: mst
  Cc: jasowang, virtualization, hch, m.szyprowski, robin.murphy, iommu,
	x86, sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel,
	Andi Kleen

In virtio the host decides whether the guest uses the DMA
API or not using the strangely named VIRTIO_F_ACCESS_PLATFORM
bit (which really indicates whether the DMA API is used or not)

For hardened virtio on TDX we want to enforce that that swiotlb is
always used, which requires using the DMA API.  While IO wouldn't
really work without the swiotlb, it might be possible that an
attacker forces swiotlbless IO to manipulate memory in the guest.

So we want to force the DMA API (which then forces swiotlb),
but without relying on the host.

There is already an arch_has_restricted_memory_acces hook for
this, which is currently used only by s390. Enable
the config option for the hook for x86 and enable it for TDX.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/Kconfig                 | 1 +
 arch/x86/mm/mem_encrypt_common.c | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 1531a0f905ed..3d804fce31b9 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -884,6 +884,7 @@ config INTEL_TDX_GUEST
 	select X86_X2APIC
 	select SECURITY_LOCKDOWN_LSM
 	select X86_MEM_ENCRYPT_COMMON
+	select ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS
 	help
 	  Provide support for running in a trusted domain on Intel processors
 	  equipped with Trusted Domain eXtenstions. TDX is a new Intel
diff --git a/arch/x86/mm/mem_encrypt_common.c b/arch/x86/mm/mem_encrypt_common.c
index 24c9117547b4..2244d1f033ab 100644
--- a/arch/x86/mm/mem_encrypt_common.c
+++ b/arch/x86/mm/mem_encrypt_common.c
@@ -9,6 +9,7 @@
 
 #include <asm/mem_encrypt_common.h>
 #include <linux/dma-mapping.h>
+#include <linux/virtio_config.h>
 #include <linux/swiotlb.h>
 
 /* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */
@@ -37,3 +38,9 @@ void __init mem_encrypt_init(void)
 		amd_mem_encrypt_init();
 }
 
+int arch_has_restricted_virtio_memory_access(void)
+{
+	return is_tdx_guest();
+}
+EXPORT_SYMBOL_GPL(arch_has_restricted_virtio_memory_access);
+
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v1 4/8] x86/tdx: Add arch_has_restricted_memory_access for TDX
@ 2021-06-03  0:41   ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  0:41 UTC (permalink / raw)
  To: mst
  Cc: Andi Kleen, jasowang, x86, linux-kernel, virtualization, iommu,
	jpoimboe, robin.murphy, hch

In virtio the host decides whether the guest uses the DMA
API or not using the strangely named VIRTIO_F_ACCESS_PLATFORM
bit (which really indicates whether the DMA API is used or not)

For hardened virtio on TDX we want to enforce that that swiotlb is
always used, which requires using the DMA API.  While IO wouldn't
really work without the swiotlb, it might be possible that an
attacker forces swiotlbless IO to manipulate memory in the guest.

So we want to force the DMA API (which then forces swiotlb),
but without relying on the host.

There is already an arch_has_restricted_memory_acces hook for
this, which is currently used only by s390. Enable
the config option for the hook for x86 and enable it for TDX.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/Kconfig                 | 1 +
 arch/x86/mm/mem_encrypt_common.c | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 1531a0f905ed..3d804fce31b9 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -884,6 +884,7 @@ config INTEL_TDX_GUEST
 	select X86_X2APIC
 	select SECURITY_LOCKDOWN_LSM
 	select X86_MEM_ENCRYPT_COMMON
+	select ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS
 	help
 	  Provide support for running in a trusted domain on Intel processors
 	  equipped with Trusted Domain eXtenstions. TDX is a new Intel
diff --git a/arch/x86/mm/mem_encrypt_common.c b/arch/x86/mm/mem_encrypt_common.c
index 24c9117547b4..2244d1f033ab 100644
--- a/arch/x86/mm/mem_encrypt_common.c
+++ b/arch/x86/mm/mem_encrypt_common.c
@@ -9,6 +9,7 @@
 
 #include <asm/mem_encrypt_common.h>
 #include <linux/dma-mapping.h>
+#include <linux/virtio_config.h>
 #include <linux/swiotlb.h>
 
 /* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */
@@ -37,3 +38,9 @@ void __init mem_encrypt_init(void)
 		amd_mem_encrypt_init();
 }
 
+int arch_has_restricted_virtio_memory_access(void)
+{
+	return is_tdx_guest();
+}
+EXPORT_SYMBOL_GPL(arch_has_restricted_virtio_memory_access);
+
-- 
2.25.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v1 4/8] x86/tdx: Add arch_has_restricted_memory_access for TDX
@ 2021-06-03  0:41   ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  0:41 UTC (permalink / raw)
  To: mst
  Cc: sathyanarayanan.kuppuswamy, Andi Kleen, x86, linux-kernel,
	virtualization, iommu, jpoimboe, robin.murphy, hch, m.szyprowski

In virtio the host decides whether the guest uses the DMA
API or not using the strangely named VIRTIO_F_ACCESS_PLATFORM
bit (which really indicates whether the DMA API is used or not)

For hardened virtio on TDX we want to enforce that that swiotlb is
always used, which requires using the DMA API.  While IO wouldn't
really work without the swiotlb, it might be possible that an
attacker forces swiotlbless IO to manipulate memory in the guest.

So we want to force the DMA API (which then forces swiotlb),
but without relying on the host.

There is already an arch_has_restricted_memory_acces hook for
this, which is currently used only by s390. Enable
the config option for the hook for x86 and enable it for TDX.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/Kconfig                 | 1 +
 arch/x86/mm/mem_encrypt_common.c | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 1531a0f905ed..3d804fce31b9 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -884,6 +884,7 @@ config INTEL_TDX_GUEST
 	select X86_X2APIC
 	select SECURITY_LOCKDOWN_LSM
 	select X86_MEM_ENCRYPT_COMMON
+	select ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS
 	help
 	  Provide support for running in a trusted domain on Intel processors
 	  equipped with Trusted Domain eXtenstions. TDX is a new Intel
diff --git a/arch/x86/mm/mem_encrypt_common.c b/arch/x86/mm/mem_encrypt_common.c
index 24c9117547b4..2244d1f033ab 100644
--- a/arch/x86/mm/mem_encrypt_common.c
+++ b/arch/x86/mm/mem_encrypt_common.c
@@ -9,6 +9,7 @@
 
 #include <asm/mem_encrypt_common.h>
 #include <linux/dma-mapping.h>
+#include <linux/virtio_config.h>
 #include <linux/swiotlb.h>
 
 /* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */
@@ -37,3 +38,9 @@ void __init mem_encrypt_init(void)
 		amd_mem_encrypt_init();
 }
 
+int arch_has_restricted_virtio_memory_access(void)
+{
+	return is_tdx_guest();
+}
+EXPORT_SYMBOL_GPL(arch_has_restricted_virtio_memory_access);
+
-- 
2.25.4

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v1 5/8] dma: Use size for swiotlb boundary checks
  2021-06-03  0:41 ` Andi Kleen
  (?)
@ 2021-06-03  0:41   ` Andi Kleen
  -1 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  0:41 UTC (permalink / raw)
  To: mst
  Cc: jasowang, virtualization, hch, m.szyprowski, robin.murphy, iommu,
	x86, sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel,
	Andi Kleen

swiotlb currently only uses the start address of a DMA to check if something
is in the swiotlb or not. But with virtio and untrusted hosts the host
could give some DMA mapping that crosses the swiotlb boundaries,
potentially leaking or corrupting data. Add size checks to all the swiotlb
checks and reject any DMAs that cross the swiotlb buffer boundaries.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 drivers/iommu/dma-iommu.c   | 13 ++++++-------
 drivers/xen/swiotlb-xen.c   | 11 ++++++-----
 include/linux/dma-mapping.h |  4 ++--
 include/linux/swiotlb.h     |  8 +++++---
 kernel/dma/direct.c         |  8 ++++----
 kernel/dma/direct.h         |  8 ++++----
 kernel/dma/mapping.c        |  4 ++--
 net/xdp/xsk_buff_pool.c     |  2 +-
 8 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7bcdd1205535..7ef13198721b 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -504,7 +504,7 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
 
 	__iommu_dma_unmap(dev, dma_addr, size);
 
-	if (unlikely(is_swiotlb_buffer(phys)))
+	if (unlikely(is_swiotlb_buffer(phys, size)))
 		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
 }
 
@@ -575,7 +575,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device *dev, phys_addr_t phys,
 	}
 
 	iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
-	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
+	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys, org_size))
 		swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
 	return iova;
 }
@@ -781,7 +781,7 @@ static void iommu_dma_sync_single_for_cpu(struct device *dev,
 	if (!dev_is_dma_coherent(dev))
 		arch_sync_dma_for_cpu(phys, size, dir);
 
-	if (is_swiotlb_buffer(phys))
+	if (is_swiotlb_buffer(phys, size))
 		swiotlb_sync_single_for_cpu(dev, phys, size, dir);
 }
 
@@ -794,7 +794,7 @@ static void iommu_dma_sync_single_for_device(struct device *dev,
 		return;
 
 	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
-	if (is_swiotlb_buffer(phys))
+	if (is_swiotlb_buffer(phys, size))
 		swiotlb_sync_single_for_device(dev, phys, size, dir);
 
 	if (!dev_is_dma_coherent(dev))
@@ -815,7 +815,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
 		if (!dev_is_dma_coherent(dev))
 			arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
 
-		if (is_swiotlb_buffer(sg_phys(sg)))
+		if (is_swiotlb_buffer(sg_phys(sg), sg->length))
 			swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
 						    sg->length, dir);
 	}
@@ -832,10 +832,9 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
 		return;
 
 	for_each_sg(sgl, sg, nelems, i) {
-		if (is_swiotlb_buffer(sg_phys(sg)))
+		if (is_swiotlb_buffer(sg_phys(sg), sg->length))
 			swiotlb_sync_single_for_device(dev, sg_phys(sg),
 						       sg->length, dir);
-
 		if (!dev_is_dma_coherent(dev))
 			arch_sync_dma_for_device(sg_phys(sg), sg->length, dir);
 	}
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 24d11861ac7d..333846af8d35 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -89,7 +89,8 @@ static inline int range_straddles_page_boundary(phys_addr_t p, size_t size)
 	return 0;
 }
 
-static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
+static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr,
+				 size_t size)
 {
 	unsigned long bfn = XEN_PFN_DOWN(dma_to_phys(dev, dma_addr));
 	unsigned long xen_pfn = bfn_to_local_pfn(bfn);
@@ -100,7 +101,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
 	 * in our domain. Therefore _only_ check address within our domain.
 	 */
 	if (pfn_valid(PFN_DOWN(paddr)))
-		return is_swiotlb_buffer(paddr);
+		return is_swiotlb_buffer(paddr, size);
 	return 0;
 }
 
@@ -431,7 +432,7 @@ static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
 	}
 
 	/* NOTE: We use dev_addr here, not paddr! */
-	if (is_xen_swiotlb_buffer(hwdev, dev_addr))
+	if (is_xen_swiotlb_buffer(hwdev, dev_addr, size))
 		swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs);
 }
 
@@ -448,7 +449,7 @@ xen_swiotlb_sync_single_for_cpu(struct device *dev, dma_addr_t dma_addr,
 			xen_dma_sync_for_cpu(dev, dma_addr, size, dir);
 	}
 
-	if (is_xen_swiotlb_buffer(dev, dma_addr))
+	if (is_xen_swiotlb_buffer(dev, dma_addr, size))
 		swiotlb_sync_single_for_cpu(dev, paddr, size, dir);
 }
 
@@ -458,7 +459,7 @@ xen_swiotlb_sync_single_for_device(struct device *dev, dma_addr_t dma_addr,
 {
 	phys_addr_t paddr = xen_dma_to_phys(dev, dma_addr);
 
-	if (is_xen_swiotlb_buffer(dev, dma_addr))
+	if (is_xen_swiotlb_buffer(dev, dma_addr, size))
 		swiotlb_sync_single_for_device(dev, paddr, size, dir);
 
 	if (!dev_is_dma_coherent(dev)) {
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 183e7103a66d..37fbd12bd4ab 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -142,7 +142,7 @@ int dma_set_mask(struct device *dev, u64 mask);
 int dma_set_coherent_mask(struct device *dev, u64 mask);
 u64 dma_get_required_mask(struct device *dev);
 size_t dma_max_mapping_size(struct device *dev);
-bool dma_need_sync(struct device *dev, dma_addr_t dma_addr);
+bool dma_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size);
 unsigned long dma_get_merge_boundary(struct device *dev);
 struct sg_table *dma_alloc_noncontiguous(struct device *dev, size_t size,
 		enum dma_data_direction dir, gfp_t gfp, unsigned long attrs);
@@ -258,7 +258,7 @@ static inline size_t dma_max_mapping_size(struct device *dev)
 {
 	return 0;
 }
-static inline bool dma_need_sync(struct device *dev, dma_addr_t dma_addr)
+static inline bool dma_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size)
 {
 	return false;
 }
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 216854a5e513..3e447f722d81 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -101,11 +101,13 @@ struct io_tlb_mem {
 };
 extern struct io_tlb_mem *io_tlb_default_mem;
 
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(phys_addr_t paddr, size_t size)
 {
 	struct io_tlb_mem *mem = io_tlb_default_mem;
 
-	return mem && paddr >= mem->start && paddr < mem->end;
+	if (paddr + size <= paddr) /* wrapping */
+		return false;
+	return mem && paddr >= mem->start && paddr + size <= mem->end;
 }
 
 void __init swiotlb_exit(void);
@@ -115,7 +117,7 @@ bool is_swiotlb_active(void);
 void __init swiotlb_adjust_size(unsigned long size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(phys_addr_t paddr, size_t size)
 {
 	return false;
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index f737e3347059..9ae6f94e868f 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -343,7 +343,7 @@ void dma_direct_sync_sg_for_device(struct device *dev,
 	for_each_sg(sgl, sg, nents, i) {
 		phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg));
 
-		if (unlikely(is_swiotlb_buffer(paddr)))
+		if (unlikely(is_swiotlb_buffer(paddr, sg->length)))
 			swiotlb_sync_single_for_device(dev, paddr, sg->length,
 						       dir);
 
@@ -369,7 +369,7 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
 		if (!dev_is_dma_coherent(dev))
 			arch_sync_dma_for_cpu(paddr, sg->length, dir);
 
-		if (unlikely(is_swiotlb_buffer(paddr)))
+		if (unlikely(is_swiotlb_buffer(paddr, sg->length)))
 			swiotlb_sync_single_for_cpu(dev, paddr, sg->length,
 						    dir);
 
@@ -501,10 +501,10 @@ size_t dma_direct_max_mapping_size(struct device *dev)
 	return SIZE_MAX;
 }
 
-bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr)
+bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size)
 {
 	return !dev_is_dma_coherent(dev) ||
-		is_swiotlb_buffer(dma_to_phys(dev, dma_addr));
+		is_swiotlb_buffer(dma_to_phys(dev, dma_addr), size);
 }
 
 /**
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index 50afc05b6f1d..4a17e431ae56 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -16,7 +16,7 @@ bool dma_direct_can_mmap(struct device *dev);
 int dma_direct_mmap(struct device *dev, struct vm_area_struct *vma,
 		void *cpu_addr, dma_addr_t dma_addr, size_t size,
 		unsigned long attrs);
-bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr);
+bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size);
 int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
 		enum dma_data_direction dir, unsigned long attrs);
 size_t dma_direct_max_mapping_size(struct device *dev);
@@ -56,7 +56,7 @@ static inline void dma_direct_sync_single_for_device(struct device *dev,
 {
 	phys_addr_t paddr = dma_to_phys(dev, addr);
 
-	if (unlikely(is_swiotlb_buffer(paddr)))
+	if (unlikely(is_swiotlb_buffer(paddr, size)))
 		swiotlb_sync_single_for_device(dev, paddr, size, dir);
 
 	if (!dev_is_dma_coherent(dev))
@@ -73,7 +73,7 @@ static inline void dma_direct_sync_single_for_cpu(struct device *dev,
 		arch_sync_dma_for_cpu_all();
 	}
 
-	if (unlikely(is_swiotlb_buffer(paddr)))
+	if (unlikely(is_swiotlb_buffer(paddr, size)))
 		swiotlb_sync_single_for_cpu(dev, paddr, size, dir);
 
 	if (dir == DMA_FROM_DEVICE)
@@ -113,7 +113,7 @@ static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
 		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
 
-	if (unlikely(is_swiotlb_buffer(phys)))
+	if (unlikely(is_swiotlb_buffer(phys, size)))
 		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
 }
 #endif /* _KERNEL_DMA_DIRECT_H */
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 2b06a809d0b9..9bf02c8d7d1b 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -716,12 +716,12 @@ size_t dma_max_mapping_size(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(dma_max_mapping_size);
 
-bool dma_need_sync(struct device *dev, dma_addr_t dma_addr)
+bool dma_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	if (dma_map_direct(dev, ops))
-		return dma_direct_need_sync(dev, dma_addr);
+		return dma_direct_need_sync(dev, dma_addr, size);
 	return ops->sync_single_for_cpu || ops->sync_single_for_device;
 }
 EXPORT_SYMBOL_GPL(dma_need_sync);
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 8de01aaac4a0..c1e404fe0cf4 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -399,7 +399,7 @@ int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
 			__xp_dma_unmap(dma_map, attrs);
 			return -ENOMEM;
 		}
-		if (dma_need_sync(dev, dma))
+		if (dma_need_sync(dev, dma, PAGE_SIZE))
 			dma_map->dma_need_sync = true;
 		dma_map->dma_pages[i] = dma;
 	}
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v1 5/8] dma: Use size for swiotlb boundary checks
@ 2021-06-03  0:41   ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  0:41 UTC (permalink / raw)
  To: mst
  Cc: Andi Kleen, jasowang, x86, linux-kernel, virtualization, iommu,
	jpoimboe, robin.murphy, hch

swiotlb currently only uses the start address of a DMA to check if something
is in the swiotlb or not. But with virtio and untrusted hosts the host
could give some DMA mapping that crosses the swiotlb boundaries,
potentially leaking or corrupting data. Add size checks to all the swiotlb
checks and reject any DMAs that cross the swiotlb buffer boundaries.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 drivers/iommu/dma-iommu.c   | 13 ++++++-------
 drivers/xen/swiotlb-xen.c   | 11 ++++++-----
 include/linux/dma-mapping.h |  4 ++--
 include/linux/swiotlb.h     |  8 +++++---
 kernel/dma/direct.c         |  8 ++++----
 kernel/dma/direct.h         |  8 ++++----
 kernel/dma/mapping.c        |  4 ++--
 net/xdp/xsk_buff_pool.c     |  2 +-
 8 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7bcdd1205535..7ef13198721b 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -504,7 +504,7 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
 
 	__iommu_dma_unmap(dev, dma_addr, size);
 
-	if (unlikely(is_swiotlb_buffer(phys)))
+	if (unlikely(is_swiotlb_buffer(phys, size)))
 		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
 }
 
@@ -575,7 +575,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device *dev, phys_addr_t phys,
 	}
 
 	iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
-	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
+	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys, org_size))
 		swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
 	return iova;
 }
@@ -781,7 +781,7 @@ static void iommu_dma_sync_single_for_cpu(struct device *dev,
 	if (!dev_is_dma_coherent(dev))
 		arch_sync_dma_for_cpu(phys, size, dir);
 
-	if (is_swiotlb_buffer(phys))
+	if (is_swiotlb_buffer(phys, size))
 		swiotlb_sync_single_for_cpu(dev, phys, size, dir);
 }
 
@@ -794,7 +794,7 @@ static void iommu_dma_sync_single_for_device(struct device *dev,
 		return;
 
 	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
-	if (is_swiotlb_buffer(phys))
+	if (is_swiotlb_buffer(phys, size))
 		swiotlb_sync_single_for_device(dev, phys, size, dir);
 
 	if (!dev_is_dma_coherent(dev))
@@ -815,7 +815,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
 		if (!dev_is_dma_coherent(dev))
 			arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
 
-		if (is_swiotlb_buffer(sg_phys(sg)))
+		if (is_swiotlb_buffer(sg_phys(sg), sg->length))
 			swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
 						    sg->length, dir);
 	}
@@ -832,10 +832,9 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
 		return;
 
 	for_each_sg(sgl, sg, nelems, i) {
-		if (is_swiotlb_buffer(sg_phys(sg)))
+		if (is_swiotlb_buffer(sg_phys(sg), sg->length))
 			swiotlb_sync_single_for_device(dev, sg_phys(sg),
 						       sg->length, dir);
-
 		if (!dev_is_dma_coherent(dev))
 			arch_sync_dma_for_device(sg_phys(sg), sg->length, dir);
 	}
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 24d11861ac7d..333846af8d35 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -89,7 +89,8 @@ static inline int range_straddles_page_boundary(phys_addr_t p, size_t size)
 	return 0;
 }
 
-static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
+static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr,
+				 size_t size)
 {
 	unsigned long bfn = XEN_PFN_DOWN(dma_to_phys(dev, dma_addr));
 	unsigned long xen_pfn = bfn_to_local_pfn(bfn);
@@ -100,7 +101,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
 	 * in our domain. Therefore _only_ check address within our domain.
 	 */
 	if (pfn_valid(PFN_DOWN(paddr)))
-		return is_swiotlb_buffer(paddr);
+		return is_swiotlb_buffer(paddr, size);
 	return 0;
 }
 
@@ -431,7 +432,7 @@ static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
 	}
 
 	/* NOTE: We use dev_addr here, not paddr! */
-	if (is_xen_swiotlb_buffer(hwdev, dev_addr))
+	if (is_xen_swiotlb_buffer(hwdev, dev_addr, size))
 		swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs);
 }
 
@@ -448,7 +449,7 @@ xen_swiotlb_sync_single_for_cpu(struct device *dev, dma_addr_t dma_addr,
 			xen_dma_sync_for_cpu(dev, dma_addr, size, dir);
 	}
 
-	if (is_xen_swiotlb_buffer(dev, dma_addr))
+	if (is_xen_swiotlb_buffer(dev, dma_addr, size))
 		swiotlb_sync_single_for_cpu(dev, paddr, size, dir);
 }
 
@@ -458,7 +459,7 @@ xen_swiotlb_sync_single_for_device(struct device *dev, dma_addr_t dma_addr,
 {
 	phys_addr_t paddr = xen_dma_to_phys(dev, dma_addr);
 
-	if (is_xen_swiotlb_buffer(dev, dma_addr))
+	if (is_xen_swiotlb_buffer(dev, dma_addr, size))
 		swiotlb_sync_single_for_device(dev, paddr, size, dir);
 
 	if (!dev_is_dma_coherent(dev)) {
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 183e7103a66d..37fbd12bd4ab 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -142,7 +142,7 @@ int dma_set_mask(struct device *dev, u64 mask);
 int dma_set_coherent_mask(struct device *dev, u64 mask);
 u64 dma_get_required_mask(struct device *dev);
 size_t dma_max_mapping_size(struct device *dev);
-bool dma_need_sync(struct device *dev, dma_addr_t dma_addr);
+bool dma_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size);
 unsigned long dma_get_merge_boundary(struct device *dev);
 struct sg_table *dma_alloc_noncontiguous(struct device *dev, size_t size,
 		enum dma_data_direction dir, gfp_t gfp, unsigned long attrs);
@@ -258,7 +258,7 @@ static inline size_t dma_max_mapping_size(struct device *dev)
 {
 	return 0;
 }
-static inline bool dma_need_sync(struct device *dev, dma_addr_t dma_addr)
+static inline bool dma_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size)
 {
 	return false;
 }
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 216854a5e513..3e447f722d81 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -101,11 +101,13 @@ struct io_tlb_mem {
 };
 extern struct io_tlb_mem *io_tlb_default_mem;
 
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(phys_addr_t paddr, size_t size)
 {
 	struct io_tlb_mem *mem = io_tlb_default_mem;
 
-	return mem && paddr >= mem->start && paddr < mem->end;
+	if (paddr + size <= paddr) /* wrapping */
+		return false;
+	return mem && paddr >= mem->start && paddr + size <= mem->end;
 }
 
 void __init swiotlb_exit(void);
@@ -115,7 +117,7 @@ bool is_swiotlb_active(void);
 void __init swiotlb_adjust_size(unsigned long size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(phys_addr_t paddr, size_t size)
 {
 	return false;
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index f737e3347059..9ae6f94e868f 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -343,7 +343,7 @@ void dma_direct_sync_sg_for_device(struct device *dev,
 	for_each_sg(sgl, sg, nents, i) {
 		phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg));
 
-		if (unlikely(is_swiotlb_buffer(paddr)))
+		if (unlikely(is_swiotlb_buffer(paddr, sg->length)))
 			swiotlb_sync_single_for_device(dev, paddr, sg->length,
 						       dir);
 
@@ -369,7 +369,7 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
 		if (!dev_is_dma_coherent(dev))
 			arch_sync_dma_for_cpu(paddr, sg->length, dir);
 
-		if (unlikely(is_swiotlb_buffer(paddr)))
+		if (unlikely(is_swiotlb_buffer(paddr, sg->length)))
 			swiotlb_sync_single_for_cpu(dev, paddr, sg->length,
 						    dir);
 
@@ -501,10 +501,10 @@ size_t dma_direct_max_mapping_size(struct device *dev)
 	return SIZE_MAX;
 }
 
-bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr)
+bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size)
 {
 	return !dev_is_dma_coherent(dev) ||
-		is_swiotlb_buffer(dma_to_phys(dev, dma_addr));
+		is_swiotlb_buffer(dma_to_phys(dev, dma_addr), size);
 }
 
 /**
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index 50afc05b6f1d..4a17e431ae56 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -16,7 +16,7 @@ bool dma_direct_can_mmap(struct device *dev);
 int dma_direct_mmap(struct device *dev, struct vm_area_struct *vma,
 		void *cpu_addr, dma_addr_t dma_addr, size_t size,
 		unsigned long attrs);
-bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr);
+bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size);
 int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
 		enum dma_data_direction dir, unsigned long attrs);
 size_t dma_direct_max_mapping_size(struct device *dev);
@@ -56,7 +56,7 @@ static inline void dma_direct_sync_single_for_device(struct device *dev,
 {
 	phys_addr_t paddr = dma_to_phys(dev, addr);
 
-	if (unlikely(is_swiotlb_buffer(paddr)))
+	if (unlikely(is_swiotlb_buffer(paddr, size)))
 		swiotlb_sync_single_for_device(dev, paddr, size, dir);
 
 	if (!dev_is_dma_coherent(dev))
@@ -73,7 +73,7 @@ static inline void dma_direct_sync_single_for_cpu(struct device *dev,
 		arch_sync_dma_for_cpu_all();
 	}
 
-	if (unlikely(is_swiotlb_buffer(paddr)))
+	if (unlikely(is_swiotlb_buffer(paddr, size)))
 		swiotlb_sync_single_for_cpu(dev, paddr, size, dir);
 
 	if (dir == DMA_FROM_DEVICE)
@@ -113,7 +113,7 @@ static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
 		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
 
-	if (unlikely(is_swiotlb_buffer(phys)))
+	if (unlikely(is_swiotlb_buffer(phys, size)))
 		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
 }
 #endif /* _KERNEL_DMA_DIRECT_H */
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 2b06a809d0b9..9bf02c8d7d1b 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -716,12 +716,12 @@ size_t dma_max_mapping_size(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(dma_max_mapping_size);
 
-bool dma_need_sync(struct device *dev, dma_addr_t dma_addr)
+bool dma_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	if (dma_map_direct(dev, ops))
-		return dma_direct_need_sync(dev, dma_addr);
+		return dma_direct_need_sync(dev, dma_addr, size);
 	return ops->sync_single_for_cpu || ops->sync_single_for_device;
 }
 EXPORT_SYMBOL_GPL(dma_need_sync);
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 8de01aaac4a0..c1e404fe0cf4 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -399,7 +399,7 @@ int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
 			__xp_dma_unmap(dma_map, attrs);
 			return -ENOMEM;
 		}
-		if (dma_need_sync(dev, dma))
+		if (dma_need_sync(dev, dma, PAGE_SIZE))
 			dma_map->dma_need_sync = true;
 		dma_map->dma_pages[i] = dma;
 	}
-- 
2.25.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v1 5/8] dma: Use size for swiotlb boundary checks
@ 2021-06-03  0:41   ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  0:41 UTC (permalink / raw)
  To: mst
  Cc: sathyanarayanan.kuppuswamy, Andi Kleen, x86, linux-kernel,
	virtualization, iommu, jpoimboe, robin.murphy, hch, m.szyprowski

swiotlb currently only uses the start address of a DMA to check if something
is in the swiotlb or not. But with virtio and untrusted hosts the host
could give some DMA mapping that crosses the swiotlb boundaries,
potentially leaking or corrupting data. Add size checks to all the swiotlb
checks and reject any DMAs that cross the swiotlb buffer boundaries.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 drivers/iommu/dma-iommu.c   | 13 ++++++-------
 drivers/xen/swiotlb-xen.c   | 11 ++++++-----
 include/linux/dma-mapping.h |  4 ++--
 include/linux/swiotlb.h     |  8 +++++---
 kernel/dma/direct.c         |  8 ++++----
 kernel/dma/direct.h         |  8 ++++----
 kernel/dma/mapping.c        |  4 ++--
 net/xdp/xsk_buff_pool.c     |  2 +-
 8 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7bcdd1205535..7ef13198721b 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -504,7 +504,7 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
 
 	__iommu_dma_unmap(dev, dma_addr, size);
 
-	if (unlikely(is_swiotlb_buffer(phys)))
+	if (unlikely(is_swiotlb_buffer(phys, size)))
 		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
 }
 
@@ -575,7 +575,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device *dev, phys_addr_t phys,
 	}
 
 	iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
-	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
+	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys, org_size))
 		swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
 	return iova;
 }
@@ -781,7 +781,7 @@ static void iommu_dma_sync_single_for_cpu(struct device *dev,
 	if (!dev_is_dma_coherent(dev))
 		arch_sync_dma_for_cpu(phys, size, dir);
 
-	if (is_swiotlb_buffer(phys))
+	if (is_swiotlb_buffer(phys, size))
 		swiotlb_sync_single_for_cpu(dev, phys, size, dir);
 }
 
@@ -794,7 +794,7 @@ static void iommu_dma_sync_single_for_device(struct device *dev,
 		return;
 
 	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
-	if (is_swiotlb_buffer(phys))
+	if (is_swiotlb_buffer(phys, size))
 		swiotlb_sync_single_for_device(dev, phys, size, dir);
 
 	if (!dev_is_dma_coherent(dev))
@@ -815,7 +815,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
 		if (!dev_is_dma_coherent(dev))
 			arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
 
-		if (is_swiotlb_buffer(sg_phys(sg)))
+		if (is_swiotlb_buffer(sg_phys(sg), sg->length))
 			swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
 						    sg->length, dir);
 	}
@@ -832,10 +832,9 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
 		return;
 
 	for_each_sg(sgl, sg, nelems, i) {
-		if (is_swiotlb_buffer(sg_phys(sg)))
+		if (is_swiotlb_buffer(sg_phys(sg), sg->length))
 			swiotlb_sync_single_for_device(dev, sg_phys(sg),
 						       sg->length, dir);
-
 		if (!dev_is_dma_coherent(dev))
 			arch_sync_dma_for_device(sg_phys(sg), sg->length, dir);
 	}
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 24d11861ac7d..333846af8d35 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -89,7 +89,8 @@ static inline int range_straddles_page_boundary(phys_addr_t p, size_t size)
 	return 0;
 }
 
-static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
+static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr,
+				 size_t size)
 {
 	unsigned long bfn = XEN_PFN_DOWN(dma_to_phys(dev, dma_addr));
 	unsigned long xen_pfn = bfn_to_local_pfn(bfn);
@@ -100,7 +101,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
 	 * in our domain. Therefore _only_ check address within our domain.
 	 */
 	if (pfn_valid(PFN_DOWN(paddr)))
-		return is_swiotlb_buffer(paddr);
+		return is_swiotlb_buffer(paddr, size);
 	return 0;
 }
 
@@ -431,7 +432,7 @@ static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
 	}
 
 	/* NOTE: We use dev_addr here, not paddr! */
-	if (is_xen_swiotlb_buffer(hwdev, dev_addr))
+	if (is_xen_swiotlb_buffer(hwdev, dev_addr, size))
 		swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs);
 }
 
@@ -448,7 +449,7 @@ xen_swiotlb_sync_single_for_cpu(struct device *dev, dma_addr_t dma_addr,
 			xen_dma_sync_for_cpu(dev, dma_addr, size, dir);
 	}
 
-	if (is_xen_swiotlb_buffer(dev, dma_addr))
+	if (is_xen_swiotlb_buffer(dev, dma_addr, size))
 		swiotlb_sync_single_for_cpu(dev, paddr, size, dir);
 }
 
@@ -458,7 +459,7 @@ xen_swiotlb_sync_single_for_device(struct device *dev, dma_addr_t dma_addr,
 {
 	phys_addr_t paddr = xen_dma_to_phys(dev, dma_addr);
 
-	if (is_xen_swiotlb_buffer(dev, dma_addr))
+	if (is_xen_swiotlb_buffer(dev, dma_addr, size))
 		swiotlb_sync_single_for_device(dev, paddr, size, dir);
 
 	if (!dev_is_dma_coherent(dev)) {
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 183e7103a66d..37fbd12bd4ab 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -142,7 +142,7 @@ int dma_set_mask(struct device *dev, u64 mask);
 int dma_set_coherent_mask(struct device *dev, u64 mask);
 u64 dma_get_required_mask(struct device *dev);
 size_t dma_max_mapping_size(struct device *dev);
-bool dma_need_sync(struct device *dev, dma_addr_t dma_addr);
+bool dma_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size);
 unsigned long dma_get_merge_boundary(struct device *dev);
 struct sg_table *dma_alloc_noncontiguous(struct device *dev, size_t size,
 		enum dma_data_direction dir, gfp_t gfp, unsigned long attrs);
@@ -258,7 +258,7 @@ static inline size_t dma_max_mapping_size(struct device *dev)
 {
 	return 0;
 }
-static inline bool dma_need_sync(struct device *dev, dma_addr_t dma_addr)
+static inline bool dma_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size)
 {
 	return false;
 }
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 216854a5e513..3e447f722d81 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -101,11 +101,13 @@ struct io_tlb_mem {
 };
 extern struct io_tlb_mem *io_tlb_default_mem;
 
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(phys_addr_t paddr, size_t size)
 {
 	struct io_tlb_mem *mem = io_tlb_default_mem;
 
-	return mem && paddr >= mem->start && paddr < mem->end;
+	if (paddr + size <= paddr) /* wrapping */
+		return false;
+	return mem && paddr >= mem->start && paddr + size <= mem->end;
 }
 
 void __init swiotlb_exit(void);
@@ -115,7 +117,7 @@ bool is_swiotlb_active(void);
 void __init swiotlb_adjust_size(unsigned long size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(phys_addr_t paddr, size_t size)
 {
 	return false;
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index f737e3347059..9ae6f94e868f 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -343,7 +343,7 @@ void dma_direct_sync_sg_for_device(struct device *dev,
 	for_each_sg(sgl, sg, nents, i) {
 		phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg));
 
-		if (unlikely(is_swiotlb_buffer(paddr)))
+		if (unlikely(is_swiotlb_buffer(paddr, sg->length)))
 			swiotlb_sync_single_for_device(dev, paddr, sg->length,
 						       dir);
 
@@ -369,7 +369,7 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
 		if (!dev_is_dma_coherent(dev))
 			arch_sync_dma_for_cpu(paddr, sg->length, dir);
 
-		if (unlikely(is_swiotlb_buffer(paddr)))
+		if (unlikely(is_swiotlb_buffer(paddr, sg->length)))
 			swiotlb_sync_single_for_cpu(dev, paddr, sg->length,
 						    dir);
 
@@ -501,10 +501,10 @@ size_t dma_direct_max_mapping_size(struct device *dev)
 	return SIZE_MAX;
 }
 
-bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr)
+bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size)
 {
 	return !dev_is_dma_coherent(dev) ||
-		is_swiotlb_buffer(dma_to_phys(dev, dma_addr));
+		is_swiotlb_buffer(dma_to_phys(dev, dma_addr), size);
 }
 
 /**
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index 50afc05b6f1d..4a17e431ae56 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -16,7 +16,7 @@ bool dma_direct_can_mmap(struct device *dev);
 int dma_direct_mmap(struct device *dev, struct vm_area_struct *vma,
 		void *cpu_addr, dma_addr_t dma_addr, size_t size,
 		unsigned long attrs);
-bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr);
+bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size);
 int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
 		enum dma_data_direction dir, unsigned long attrs);
 size_t dma_direct_max_mapping_size(struct device *dev);
@@ -56,7 +56,7 @@ static inline void dma_direct_sync_single_for_device(struct device *dev,
 {
 	phys_addr_t paddr = dma_to_phys(dev, addr);
 
-	if (unlikely(is_swiotlb_buffer(paddr)))
+	if (unlikely(is_swiotlb_buffer(paddr, size)))
 		swiotlb_sync_single_for_device(dev, paddr, size, dir);
 
 	if (!dev_is_dma_coherent(dev))
@@ -73,7 +73,7 @@ static inline void dma_direct_sync_single_for_cpu(struct device *dev,
 		arch_sync_dma_for_cpu_all();
 	}
 
-	if (unlikely(is_swiotlb_buffer(paddr)))
+	if (unlikely(is_swiotlb_buffer(paddr, size)))
 		swiotlb_sync_single_for_cpu(dev, paddr, size, dir);
 
 	if (dir == DMA_FROM_DEVICE)
@@ -113,7 +113,7 @@ static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
 		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
 
-	if (unlikely(is_swiotlb_buffer(phys)))
+	if (unlikely(is_swiotlb_buffer(phys, size)))
 		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
 }
 #endif /* _KERNEL_DMA_DIRECT_H */
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 2b06a809d0b9..9bf02c8d7d1b 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -716,12 +716,12 @@ size_t dma_max_mapping_size(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(dma_max_mapping_size);
 
-bool dma_need_sync(struct device *dev, dma_addr_t dma_addr)
+bool dma_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	if (dma_map_direct(dev, ops))
-		return dma_direct_need_sync(dev, dma_addr);
+		return dma_direct_need_sync(dev, dma_addr, size);
 	return ops->sync_single_for_cpu || ops->sync_single_for_device;
 }
 EXPORT_SYMBOL_GPL(dma_need_sync);
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 8de01aaac4a0..c1e404fe0cf4 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -399,7 +399,7 @@ int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
 			__xp_dma_unmap(dma_map, attrs);
 			return -ENOMEM;
 		}
-		if (dma_need_sync(dev, dma))
+		if (dma_need_sync(dev, dma, PAGE_SIZE))
 			dma_map->dma_need_sync = true;
 		dma_map->dma_pages[i] = dma;
 	}
-- 
2.25.4

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v1 6/8] dma: Add return value to dma_unmap_page
  2021-06-03  0:41 ` Andi Kleen
  (?)
@ 2021-06-03  0:41   ` Andi Kleen
  -1 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  0:41 UTC (permalink / raw)
  To: mst
  Cc: jasowang, virtualization, hch, m.szyprowski, robin.murphy, iommu,
	x86, sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel,
	Andi Kleen

In some situations when we know swiotlb is forced and we have
to deal with untrusted hosts, it's useful to know if a mapping
was in the swiotlb or not. This allows us to abort any IO
operation that would access memory outside the swiotlb.

Otherwise it might be possible for a malicious host to inject
any guest page in a read operation. While it couldn't directly
access the results of the read() inside the guest, there
might scenarios where data is echoed back with a write(),
and that would then leak guest memory.

Add a return value to dma_unmap_single/page. Most users
of course will ignore it. The return value is set to EIO
if we're in forced swiotlb mode and the buffer is not inside
the swiotlb buffer. Otherwise it's always 0.

A new callback is used to avoid changing all the IOMMU drivers.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 drivers/iommu/dma-iommu.c   | 17 +++++++++++------
 include/linux/dma-map-ops.h |  3 +++
 include/linux/dma-mapping.h |  7 ++++---
 kernel/dma/mapping.c        |  6 +++++-
 4 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7ef13198721b..babe46f2ae3a 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -491,7 +491,8 @@ static void __iommu_dma_unmap(struct device *dev, dma_addr_t dma_addr,
 	iommu_dma_free_iova(cookie, dma_addr, size, iotlb_gather.freelist);
 }
 
-static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
+static int __iommu_dma_unmap_swiotlb_check(struct device *dev,
+		dma_addr_t dma_addr,
 		size_t size, enum dma_data_direction dir,
 		unsigned long attrs)
 {
@@ -500,12 +501,15 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
 
 	phys = iommu_iova_to_phys(domain, dma_addr);
 	if (WARN_ON(!phys))
-		return;
+		return -EIO;
 
 	__iommu_dma_unmap(dev, dma_addr, size);
 
 	if (unlikely(is_swiotlb_buffer(phys, size)))
 		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
+	else if (swiotlb_force == SWIOTLB_FORCE)
+		return -EIO;
+	return 0;
 }
 
 static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
@@ -856,12 +860,13 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 	return dma_handle;
 }
 
-static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
+static int iommu_dma_unmap_page_check(struct device *dev, dma_addr_t dma_handle,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
 		iommu_dma_sync_single_for_cpu(dev, dma_handle, size, dir);
-	__iommu_dma_unmap_swiotlb(dev, dma_handle, size, dir, attrs);
+	return __iommu_dma_unmap_swiotlb_check(dev, dma_handle, size, dir,
+					       attrs);
 }
 
 /*
@@ -946,7 +951,7 @@ static void iommu_dma_unmap_sg_swiotlb(struct device *dev, struct scatterlist *s
 	int i;
 
 	for_each_sg(sg, s, nents, i)
-		__iommu_dma_unmap_swiotlb(dev, sg_dma_address(s),
+		__iommu_dma_unmap_swiotlb_check(dev, sg_dma_address(s),
 				sg_dma_len(s), dir, attrs);
 }
 
@@ -1291,7 +1296,7 @@ static const struct dma_map_ops iommu_dma_ops = {
 	.mmap			= iommu_dma_mmap,
 	.get_sgtable		= iommu_dma_get_sgtable,
 	.map_page		= iommu_dma_map_page,
-	.unmap_page		= iommu_dma_unmap_page,
+	.unmap_page_check	= iommu_dma_unmap_page_check,
 	.map_sg			= iommu_dma_map_sg,
 	.unmap_sg		= iommu_dma_unmap_sg,
 	.sync_single_for_cpu	= iommu_dma_sync_single_for_cpu,
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 0d53a96a3d64..0ed0190f7949 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -69,6 +69,9 @@ struct dma_map_ops {
 	u64 (*get_required_mask)(struct device *dev);
 	size_t (*max_mapping_size)(struct device *dev);
 	unsigned long (*get_merge_boundary)(struct device *dev);
+	int (*unmap_page_check)(struct device *dev, dma_addr_t dma_handle,
+			size_t size, enum dma_data_direction dir,
+			unsigned long attrs);
 };
 
 #ifdef CONFIG_DMA_OPS
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 37fbd12bd4ab..25b8382f8601 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -103,7 +103,7 @@ static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
 dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 		size_t offset, size_t size, enum dma_data_direction dir,
 		unsigned long attrs);
-void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
+int dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
 		enum dma_data_direction dir, unsigned long attrs);
 int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
 		enum dma_data_direction dir, unsigned long attrs);
@@ -160,9 +160,10 @@ static inline dma_addr_t dma_map_page_attrs(struct device *dev,
 {
 	return DMA_MAPPING_ERROR;
 }
-static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
+static inline int dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
+	return 0;
 }
 static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
 		int nents, enum dma_data_direction dir, unsigned long attrs)
@@ -323,7 +324,7 @@ static inline dma_addr_t dma_map_single_attrs(struct device *dev, void *ptr,
 			size, dir, attrs);
 }
 
-static inline void dma_unmap_single_attrs(struct device *dev, dma_addr_t addr,
+static inline int dma_unmap_single_attrs(struct device *dev, dma_addr_t addr,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
 	return dma_unmap_page_attrs(dev, addr, size, dir, attrs);
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 9bf02c8d7d1b..dc0ce649d1f9 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -162,18 +162,22 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 }
 EXPORT_SYMBOL(dma_map_page_attrs);
 
-void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
+int dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
 		enum dma_data_direction dir, unsigned long attrs)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
+	int ret = 0;
 
 	BUG_ON(!valid_dma_direction(dir));
 	if (dma_map_direct(dev, ops) ||
 	    arch_dma_unmap_page_direct(dev, addr + size))
 		dma_direct_unmap_page(dev, addr, size, dir, attrs);
+	else if (ops->unmap_page_check)
+		ret = ops->unmap_page_check(dev, addr, size, dir, attrs);
 	else if (ops->unmap_page)
 		ops->unmap_page(dev, addr, size, dir, attrs);
 	debug_dma_unmap_page(dev, addr, size, dir);
+	return ret;
 }
 EXPORT_SYMBOL(dma_unmap_page_attrs);
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v1 6/8] dma: Add return value to dma_unmap_page
@ 2021-06-03  0:41   ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  0:41 UTC (permalink / raw)
  To: mst
  Cc: Andi Kleen, jasowang, x86, linux-kernel, virtualization, iommu,
	jpoimboe, robin.murphy, hch

In some situations when we know swiotlb is forced and we have
to deal with untrusted hosts, it's useful to know if a mapping
was in the swiotlb or not. This allows us to abort any IO
operation that would access memory outside the swiotlb.

Otherwise it might be possible for a malicious host to inject
any guest page in a read operation. While it couldn't directly
access the results of the read() inside the guest, there
might scenarios where data is echoed back with a write(),
and that would then leak guest memory.

Add a return value to dma_unmap_single/page. Most users
of course will ignore it. The return value is set to EIO
if we're in forced swiotlb mode and the buffer is not inside
the swiotlb buffer. Otherwise it's always 0.

A new callback is used to avoid changing all the IOMMU drivers.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 drivers/iommu/dma-iommu.c   | 17 +++++++++++------
 include/linux/dma-map-ops.h |  3 +++
 include/linux/dma-mapping.h |  7 ++++---
 kernel/dma/mapping.c        |  6 +++++-
 4 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7ef13198721b..babe46f2ae3a 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -491,7 +491,8 @@ static void __iommu_dma_unmap(struct device *dev, dma_addr_t dma_addr,
 	iommu_dma_free_iova(cookie, dma_addr, size, iotlb_gather.freelist);
 }
 
-static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
+static int __iommu_dma_unmap_swiotlb_check(struct device *dev,
+		dma_addr_t dma_addr,
 		size_t size, enum dma_data_direction dir,
 		unsigned long attrs)
 {
@@ -500,12 +501,15 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
 
 	phys = iommu_iova_to_phys(domain, dma_addr);
 	if (WARN_ON(!phys))
-		return;
+		return -EIO;
 
 	__iommu_dma_unmap(dev, dma_addr, size);
 
 	if (unlikely(is_swiotlb_buffer(phys, size)))
 		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
+	else if (swiotlb_force == SWIOTLB_FORCE)
+		return -EIO;
+	return 0;
 }
 
 static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
@@ -856,12 +860,13 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 	return dma_handle;
 }
 
-static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
+static int iommu_dma_unmap_page_check(struct device *dev, dma_addr_t dma_handle,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
 		iommu_dma_sync_single_for_cpu(dev, dma_handle, size, dir);
-	__iommu_dma_unmap_swiotlb(dev, dma_handle, size, dir, attrs);
+	return __iommu_dma_unmap_swiotlb_check(dev, dma_handle, size, dir,
+					       attrs);
 }
 
 /*
@@ -946,7 +951,7 @@ static void iommu_dma_unmap_sg_swiotlb(struct device *dev, struct scatterlist *s
 	int i;
 
 	for_each_sg(sg, s, nents, i)
-		__iommu_dma_unmap_swiotlb(dev, sg_dma_address(s),
+		__iommu_dma_unmap_swiotlb_check(dev, sg_dma_address(s),
 				sg_dma_len(s), dir, attrs);
 }
 
@@ -1291,7 +1296,7 @@ static const struct dma_map_ops iommu_dma_ops = {
 	.mmap			= iommu_dma_mmap,
 	.get_sgtable		= iommu_dma_get_sgtable,
 	.map_page		= iommu_dma_map_page,
-	.unmap_page		= iommu_dma_unmap_page,
+	.unmap_page_check	= iommu_dma_unmap_page_check,
 	.map_sg			= iommu_dma_map_sg,
 	.unmap_sg		= iommu_dma_unmap_sg,
 	.sync_single_for_cpu	= iommu_dma_sync_single_for_cpu,
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 0d53a96a3d64..0ed0190f7949 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -69,6 +69,9 @@ struct dma_map_ops {
 	u64 (*get_required_mask)(struct device *dev);
 	size_t (*max_mapping_size)(struct device *dev);
 	unsigned long (*get_merge_boundary)(struct device *dev);
+	int (*unmap_page_check)(struct device *dev, dma_addr_t dma_handle,
+			size_t size, enum dma_data_direction dir,
+			unsigned long attrs);
 };
 
 #ifdef CONFIG_DMA_OPS
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 37fbd12bd4ab..25b8382f8601 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -103,7 +103,7 @@ static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
 dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 		size_t offset, size_t size, enum dma_data_direction dir,
 		unsigned long attrs);
-void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
+int dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
 		enum dma_data_direction dir, unsigned long attrs);
 int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
 		enum dma_data_direction dir, unsigned long attrs);
@@ -160,9 +160,10 @@ static inline dma_addr_t dma_map_page_attrs(struct device *dev,
 {
 	return DMA_MAPPING_ERROR;
 }
-static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
+static inline int dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
+	return 0;
 }
 static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
 		int nents, enum dma_data_direction dir, unsigned long attrs)
@@ -323,7 +324,7 @@ static inline dma_addr_t dma_map_single_attrs(struct device *dev, void *ptr,
 			size, dir, attrs);
 }
 
-static inline void dma_unmap_single_attrs(struct device *dev, dma_addr_t addr,
+static inline int dma_unmap_single_attrs(struct device *dev, dma_addr_t addr,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
 	return dma_unmap_page_attrs(dev, addr, size, dir, attrs);
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 9bf02c8d7d1b..dc0ce649d1f9 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -162,18 +162,22 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 }
 EXPORT_SYMBOL(dma_map_page_attrs);
 
-void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
+int dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
 		enum dma_data_direction dir, unsigned long attrs)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
+	int ret = 0;
 
 	BUG_ON(!valid_dma_direction(dir));
 	if (dma_map_direct(dev, ops) ||
 	    arch_dma_unmap_page_direct(dev, addr + size))
 		dma_direct_unmap_page(dev, addr, size, dir, attrs);
+	else if (ops->unmap_page_check)
+		ret = ops->unmap_page_check(dev, addr, size, dir, attrs);
 	else if (ops->unmap_page)
 		ops->unmap_page(dev, addr, size, dir, attrs);
 	debug_dma_unmap_page(dev, addr, size, dir);
+	return ret;
 }
 EXPORT_SYMBOL(dma_unmap_page_attrs);
 
-- 
2.25.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v1 6/8] dma: Add return value to dma_unmap_page
@ 2021-06-03  0:41   ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  0:41 UTC (permalink / raw)
  To: mst
  Cc: sathyanarayanan.kuppuswamy, Andi Kleen, x86, linux-kernel,
	virtualization, iommu, jpoimboe, robin.murphy, hch, m.szyprowski

In some situations when we know swiotlb is forced and we have
to deal with untrusted hosts, it's useful to know if a mapping
was in the swiotlb or not. This allows us to abort any IO
operation that would access memory outside the swiotlb.

Otherwise it might be possible for a malicious host to inject
any guest page in a read operation. While it couldn't directly
access the results of the read() inside the guest, there
might scenarios where data is echoed back with a write(),
and that would then leak guest memory.

Add a return value to dma_unmap_single/page. Most users
of course will ignore it. The return value is set to EIO
if we're in forced swiotlb mode and the buffer is not inside
the swiotlb buffer. Otherwise it's always 0.

A new callback is used to avoid changing all the IOMMU drivers.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 drivers/iommu/dma-iommu.c   | 17 +++++++++++------
 include/linux/dma-map-ops.h |  3 +++
 include/linux/dma-mapping.h |  7 ++++---
 kernel/dma/mapping.c        |  6 +++++-
 4 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7ef13198721b..babe46f2ae3a 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -491,7 +491,8 @@ static void __iommu_dma_unmap(struct device *dev, dma_addr_t dma_addr,
 	iommu_dma_free_iova(cookie, dma_addr, size, iotlb_gather.freelist);
 }
 
-static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
+static int __iommu_dma_unmap_swiotlb_check(struct device *dev,
+		dma_addr_t dma_addr,
 		size_t size, enum dma_data_direction dir,
 		unsigned long attrs)
 {
@@ -500,12 +501,15 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
 
 	phys = iommu_iova_to_phys(domain, dma_addr);
 	if (WARN_ON(!phys))
-		return;
+		return -EIO;
 
 	__iommu_dma_unmap(dev, dma_addr, size);
 
 	if (unlikely(is_swiotlb_buffer(phys, size)))
 		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
+	else if (swiotlb_force == SWIOTLB_FORCE)
+		return -EIO;
+	return 0;
 }
 
 static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
@@ -856,12 +860,13 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 	return dma_handle;
 }
 
-static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
+static int iommu_dma_unmap_page_check(struct device *dev, dma_addr_t dma_handle,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
 		iommu_dma_sync_single_for_cpu(dev, dma_handle, size, dir);
-	__iommu_dma_unmap_swiotlb(dev, dma_handle, size, dir, attrs);
+	return __iommu_dma_unmap_swiotlb_check(dev, dma_handle, size, dir,
+					       attrs);
 }
 
 /*
@@ -946,7 +951,7 @@ static void iommu_dma_unmap_sg_swiotlb(struct device *dev, struct scatterlist *s
 	int i;
 
 	for_each_sg(sg, s, nents, i)
-		__iommu_dma_unmap_swiotlb(dev, sg_dma_address(s),
+		__iommu_dma_unmap_swiotlb_check(dev, sg_dma_address(s),
 				sg_dma_len(s), dir, attrs);
 }
 
@@ -1291,7 +1296,7 @@ static const struct dma_map_ops iommu_dma_ops = {
 	.mmap			= iommu_dma_mmap,
 	.get_sgtable		= iommu_dma_get_sgtable,
 	.map_page		= iommu_dma_map_page,
-	.unmap_page		= iommu_dma_unmap_page,
+	.unmap_page_check	= iommu_dma_unmap_page_check,
 	.map_sg			= iommu_dma_map_sg,
 	.unmap_sg		= iommu_dma_unmap_sg,
 	.sync_single_for_cpu	= iommu_dma_sync_single_for_cpu,
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 0d53a96a3d64..0ed0190f7949 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -69,6 +69,9 @@ struct dma_map_ops {
 	u64 (*get_required_mask)(struct device *dev);
 	size_t (*max_mapping_size)(struct device *dev);
 	unsigned long (*get_merge_boundary)(struct device *dev);
+	int (*unmap_page_check)(struct device *dev, dma_addr_t dma_handle,
+			size_t size, enum dma_data_direction dir,
+			unsigned long attrs);
 };
 
 #ifdef CONFIG_DMA_OPS
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 37fbd12bd4ab..25b8382f8601 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -103,7 +103,7 @@ static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
 dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 		size_t offset, size_t size, enum dma_data_direction dir,
 		unsigned long attrs);
-void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
+int dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
 		enum dma_data_direction dir, unsigned long attrs);
 int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
 		enum dma_data_direction dir, unsigned long attrs);
@@ -160,9 +160,10 @@ static inline dma_addr_t dma_map_page_attrs(struct device *dev,
 {
 	return DMA_MAPPING_ERROR;
 }
-static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
+static inline int dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
+	return 0;
 }
 static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
 		int nents, enum dma_data_direction dir, unsigned long attrs)
@@ -323,7 +324,7 @@ static inline dma_addr_t dma_map_single_attrs(struct device *dev, void *ptr,
 			size, dir, attrs);
 }
 
-static inline void dma_unmap_single_attrs(struct device *dev, dma_addr_t addr,
+static inline int dma_unmap_single_attrs(struct device *dev, dma_addr_t addr,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
 	return dma_unmap_page_attrs(dev, addr, size, dir, attrs);
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 9bf02c8d7d1b..dc0ce649d1f9 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -162,18 +162,22 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 }
 EXPORT_SYMBOL(dma_map_page_attrs);
 
-void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
+int dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
 		enum dma_data_direction dir, unsigned long attrs)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
+	int ret = 0;
 
 	BUG_ON(!valid_dma_direction(dir));
 	if (dma_map_direct(dev, ops) ||
 	    arch_dma_unmap_page_direct(dev, addr + size))
 		dma_direct_unmap_page(dev, addr, size, dir, attrs);
+	else if (ops->unmap_page_check)
+		ret = ops->unmap_page_check(dev, addr, size, dir, attrs);
 	else if (ops->unmap_page)
 		ops->unmap_page(dev, addr, size, dir, attrs);
 	debug_dma_unmap_page(dev, addr, size, dir);
+	return ret;
 }
 EXPORT_SYMBOL(dma_unmap_page_attrs);
 
-- 
2.25.4

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v1 7/8] virtio: Abort IO when descriptor points outside forced swiotlb
  2021-06-03  0:41 ` Andi Kleen
  (?)
@ 2021-06-03  0:41   ` Andi Kleen
  -1 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  0:41 UTC (permalink / raw)
  To: mst
  Cc: jasowang, virtualization, hch, m.szyprowski, robin.murphy, iommu,
	x86, sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel,
	Andi Kleen

Now that we have a return value for unmapping DMA mappings that
are outside the forced swiotlb, use that to abort the IO operation.

This prevents the host from subverting a read to access some
data in the guest address space, which it might then get access somehow in
another IO operation. It can subvert reads to point to other
reads or other writes, but since it controls IO it can do
that anyways.

This is only done for the split code path, which is the only
one supported with confidential guests.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 drivers/virtio/virtio_ring.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 1e9aa1e95e1b..244a5b62d85c 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -365,29 +365,31 @@ static int vring_mapping_error(const struct vring_virtqueue *vq,
  * Split ring specific functions - *_split().
  */
 
-static void vring_unmap_one_split(const struct vring_virtqueue *vq,
+static int vring_unmap_one_split(const struct vring_virtqueue *vq,
 				  struct vring_desc *desc)
 {
 	u16 flags;
+	int ret;
 
 	if (!vq->use_dma_api)
-		return;
+		return 0;
 
 	flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
 
 	if (flags & VRING_DESC_F_INDIRECT) {
-		dma_unmap_single(vring_dma_dev(vq),
+		ret = dma_unmap_single(vring_dma_dev(vq),
 				 virtio64_to_cpu(vq->vq.vdev, desc->addr),
 				 virtio32_to_cpu(vq->vq.vdev, desc->len),
 				 (flags & VRING_DESC_F_WRITE) ?
 				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
 	} else {
-		dma_unmap_page(vring_dma_dev(vq),
+		ret = dma_unmap_page(vring_dma_dev(vq),
 			       virtio64_to_cpu(vq->vq.vdev, desc->addr),
 			       virtio32_to_cpu(vq->vq.vdev, desc->len),
 			       (flags & VRING_DESC_F_WRITE) ?
 			       DMA_FROM_DEVICE : DMA_TO_DEVICE);
 	}
+	return ret;
 }
 
 static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
@@ -609,6 +611,10 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 			break;
 		if (!inside_split_ring(vq, i))
 			break;
+		/*
+		 * Ignore unmapping errors since
+		 * we're aborting anyways.
+		 */
 		vring_unmap_one_split(vq, &desc[i]);
 		i = virtio16_to_cpu(_vq->vdev, desc[i].next);
 	}
@@ -671,7 +677,10 @@ static int detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 	i = head;
 
 	while (vq->split.vring.desc[i].flags & nextflag) {
-		vring_unmap_one_split(vq, &vq->split.vring.desc[i]);
+		int ret;
+		ret = vring_unmap_one_split(vq, &vq->split.vring.desc[i]);
+		if (ret)
+			return ret;
 		i = virtio16_to_cpu(vq->vq.vdev, vq->split.vring.desc[i].next);
 		if (!inside_split_ring(vq, i))
 			return -EIO;
@@ -878,6 +887,7 @@ static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
 			continue;
 		/* detach_buf_split clears data, so grab it now. */
 		buf = vq->split.desc_state[i].data;
+		/* Ignore unmap errors because there is nothing to abort */
 		detach_buf_split(vq, i, NULL);
 		/* Don't need to check for error because nothing is returned */
 		vq->split.avail_idx_shadow--;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v1 7/8] virtio: Abort IO when descriptor points outside forced swiotlb
@ 2021-06-03  0:41   ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  0:41 UTC (permalink / raw)
  To: mst
  Cc: Andi Kleen, jasowang, x86, linux-kernel, virtualization, iommu,
	jpoimboe, robin.murphy, hch

Now that we have a return value for unmapping DMA mappings that
are outside the forced swiotlb, use that to abort the IO operation.

This prevents the host from subverting a read to access some
data in the guest address space, which it might then get access somehow in
another IO operation. It can subvert reads to point to other
reads or other writes, but since it controls IO it can do
that anyways.

This is only done for the split code path, which is the only
one supported with confidential guests.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 drivers/virtio/virtio_ring.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 1e9aa1e95e1b..244a5b62d85c 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -365,29 +365,31 @@ static int vring_mapping_error(const struct vring_virtqueue *vq,
  * Split ring specific functions - *_split().
  */
 
-static void vring_unmap_one_split(const struct vring_virtqueue *vq,
+static int vring_unmap_one_split(const struct vring_virtqueue *vq,
 				  struct vring_desc *desc)
 {
 	u16 flags;
+	int ret;
 
 	if (!vq->use_dma_api)
-		return;
+		return 0;
 
 	flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
 
 	if (flags & VRING_DESC_F_INDIRECT) {
-		dma_unmap_single(vring_dma_dev(vq),
+		ret = dma_unmap_single(vring_dma_dev(vq),
 				 virtio64_to_cpu(vq->vq.vdev, desc->addr),
 				 virtio32_to_cpu(vq->vq.vdev, desc->len),
 				 (flags & VRING_DESC_F_WRITE) ?
 				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
 	} else {
-		dma_unmap_page(vring_dma_dev(vq),
+		ret = dma_unmap_page(vring_dma_dev(vq),
 			       virtio64_to_cpu(vq->vq.vdev, desc->addr),
 			       virtio32_to_cpu(vq->vq.vdev, desc->len),
 			       (flags & VRING_DESC_F_WRITE) ?
 			       DMA_FROM_DEVICE : DMA_TO_DEVICE);
 	}
+	return ret;
 }
 
 static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
@@ -609,6 +611,10 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 			break;
 		if (!inside_split_ring(vq, i))
 			break;
+		/*
+		 * Ignore unmapping errors since
+		 * we're aborting anyways.
+		 */
 		vring_unmap_one_split(vq, &desc[i]);
 		i = virtio16_to_cpu(_vq->vdev, desc[i].next);
 	}
@@ -671,7 +677,10 @@ static int detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 	i = head;
 
 	while (vq->split.vring.desc[i].flags & nextflag) {
-		vring_unmap_one_split(vq, &vq->split.vring.desc[i]);
+		int ret;
+		ret = vring_unmap_one_split(vq, &vq->split.vring.desc[i]);
+		if (ret)
+			return ret;
 		i = virtio16_to_cpu(vq->vq.vdev, vq->split.vring.desc[i].next);
 		if (!inside_split_ring(vq, i))
 			return -EIO;
@@ -878,6 +887,7 @@ static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
 			continue;
 		/* detach_buf_split clears data, so grab it now. */
 		buf = vq->split.desc_state[i].data;
+		/* Ignore unmap errors because there is nothing to abort */
 		detach_buf_split(vq, i, NULL);
 		/* Don't need to check for error because nothing is returned */
 		vq->split.avail_idx_shadow--;
-- 
2.25.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v1 7/8] virtio: Abort IO when descriptor points outside forced swiotlb
@ 2021-06-03  0:41   ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  0:41 UTC (permalink / raw)
  To: mst
  Cc: sathyanarayanan.kuppuswamy, Andi Kleen, x86, linux-kernel,
	virtualization, iommu, jpoimboe, robin.murphy, hch, m.szyprowski

Now that we have a return value for unmapping DMA mappings that
are outside the forced swiotlb, use that to abort the IO operation.

This prevents the host from subverting a read to access some
data in the guest address space, which it might then get access somehow in
another IO operation. It can subvert reads to point to other
reads or other writes, but since it controls IO it can do
that anyways.

This is only done for the split code path, which is the only
one supported with confidential guests.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 drivers/virtio/virtio_ring.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 1e9aa1e95e1b..244a5b62d85c 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -365,29 +365,31 @@ static int vring_mapping_error(const struct vring_virtqueue *vq,
  * Split ring specific functions - *_split().
  */
 
-static void vring_unmap_one_split(const struct vring_virtqueue *vq,
+static int vring_unmap_one_split(const struct vring_virtqueue *vq,
 				  struct vring_desc *desc)
 {
 	u16 flags;
+	int ret;
 
 	if (!vq->use_dma_api)
-		return;
+		return 0;
 
 	flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
 
 	if (flags & VRING_DESC_F_INDIRECT) {
-		dma_unmap_single(vring_dma_dev(vq),
+		ret = dma_unmap_single(vring_dma_dev(vq),
 				 virtio64_to_cpu(vq->vq.vdev, desc->addr),
 				 virtio32_to_cpu(vq->vq.vdev, desc->len),
 				 (flags & VRING_DESC_F_WRITE) ?
 				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
 	} else {
-		dma_unmap_page(vring_dma_dev(vq),
+		ret = dma_unmap_page(vring_dma_dev(vq),
 			       virtio64_to_cpu(vq->vq.vdev, desc->addr),
 			       virtio32_to_cpu(vq->vq.vdev, desc->len),
 			       (flags & VRING_DESC_F_WRITE) ?
 			       DMA_FROM_DEVICE : DMA_TO_DEVICE);
 	}
+	return ret;
 }
 
 static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
@@ -609,6 +611,10 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 			break;
 		if (!inside_split_ring(vq, i))
 			break;
+		/*
+		 * Ignore unmapping errors since
+		 * we're aborting anyways.
+		 */
 		vring_unmap_one_split(vq, &desc[i]);
 		i = virtio16_to_cpu(_vq->vdev, desc[i].next);
 	}
@@ -671,7 +677,10 @@ static int detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 	i = head;
 
 	while (vq->split.vring.desc[i].flags & nextflag) {
-		vring_unmap_one_split(vq, &vq->split.vring.desc[i]);
+		int ret;
+		ret = vring_unmap_one_split(vq, &vq->split.vring.desc[i]);
+		if (ret)
+			return ret;
 		i = virtio16_to_cpu(vq->vq.vdev, vq->split.vring.desc[i].next);
 		if (!inside_split_ring(vq, i))
 			return -EIO;
@@ -878,6 +887,7 @@ static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
 			continue;
 		/* detach_buf_split clears data, so grab it now. */
 		buf = vq->split.desc_state[i].data;
+		/* Ignore unmap errors because there is nothing to abort */
 		detach_buf_split(vq, i, NULL);
 		/* Don't need to check for error because nothing is returned */
 		vq->split.avail_idx_shadow--;
-- 
2.25.4

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v1 8/8] virtio: Error out on endless free lists
  2021-06-03  0:41 ` Andi Kleen
  (?)
@ 2021-06-03  0:41   ` Andi Kleen
  -1 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  0:41 UTC (permalink / raw)
  To: mst
  Cc: jasowang, virtualization, hch, m.szyprowski, robin.murphy, iommu,
	x86, sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel,
	Andi Kleen

Error out with a warning when the free list loops longer
than the maximum size while freeing descriptors. While technically
we don't care about DOS it is still better to abort it early.

We ran into this problem while fuzzing the virtio interactions
where the fuzzed code would get stuck for a long time.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 drivers/virtio/virtio_ring.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 244a5b62d85c..96adaa4c5404 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -685,6 +685,11 @@ static int detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 		if (!inside_split_ring(vq, i))
 			return -EIO;
 		vq->vq.num_free++;
+		if (WARN_ONCE(vq->vq.num_free >
+				vq->split.queue_size_in_bytes /
+					sizeof(struct vring_desc),
+				"Virtio freelist corrupted"))
+			return -EIO;
 	}
 
 	vring_unmap_one_split(vq, &vq->split.vring.desc[i]);
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v1 8/8] virtio: Error out on endless free lists
@ 2021-06-03  0:41   ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  0:41 UTC (permalink / raw)
  To: mst
  Cc: Andi Kleen, jasowang, x86, linux-kernel, virtualization, iommu,
	jpoimboe, robin.murphy, hch

Error out with a warning when the free list loops longer
than the maximum size while freeing descriptors. While technically
we don't care about DOS it is still better to abort it early.

We ran into this problem while fuzzing the virtio interactions
where the fuzzed code would get stuck for a long time.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 drivers/virtio/virtio_ring.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 244a5b62d85c..96adaa4c5404 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -685,6 +685,11 @@ static int detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 		if (!inside_split_ring(vq, i))
 			return -EIO;
 		vq->vq.num_free++;
+		if (WARN_ONCE(vq->vq.num_free >
+				vq->split.queue_size_in_bytes /
+					sizeof(struct vring_desc),
+				"Virtio freelist corrupted"))
+			return -EIO;
 	}
 
 	vring_unmap_one_split(vq, &vq->split.vring.desc[i]);
-- 
2.25.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v1 8/8] virtio: Error out on endless free lists
@ 2021-06-03  0:41   ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  0:41 UTC (permalink / raw)
  To: mst
  Cc: sathyanarayanan.kuppuswamy, Andi Kleen, x86, linux-kernel,
	virtualization, iommu, jpoimboe, robin.murphy, hch, m.szyprowski

Error out with a warning when the free list loops longer
than the maximum size while freeing descriptors. While technically
we don't care about DOS it is still better to abort it early.

We ran into this problem while fuzzing the virtio interactions
where the fuzzed code would get stuck for a long time.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 drivers/virtio/virtio_ring.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 244a5b62d85c..96adaa4c5404 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -685,6 +685,11 @@ static int detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 		if (!inside_split_ring(vq, i))
 			return -EIO;
 		vq->vq.num_free++;
+		if (WARN_ONCE(vq->vq.num_free >
+				vq->split.queue_size_in_bytes /
+					sizeof(struct vring_desc),
+				"Virtio freelist corrupted"))
+			return -EIO;
 	}
 
 	vring_unmap_one_split(vq, &vq->split.vring.desc[i]);
-- 
2.25.4

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* Re: Virtio hardening for TDX
  2021-06-03  0:41 ` Andi Kleen
  (?)
@ 2021-06-03  1:34   ` Jason Wang
  -1 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-03  1:34 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: virtualization, hch, m.szyprowski, robin.murphy, iommu, x86,
	sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel


在 2021/6/3 上午8:41, Andi Kleen 写道:
> [v1: Initial post]
>
> With confidential computing like TDX the guest doesn't trust the host
> anymore. The host is allowed to DOS of course, but it is not allowed
> to read or write any guest memory not explicitely shared with it.
>
> This has implication for virtio. Traditionally virtio didn't assume
> the other side of the communication channel is malicious, and therefore
> didn't do any boundary checks in virtio ring data structures.
>
> This patchkit does hardening for virtio.  In a TDX like model
> the only host memory accesses allowed are in the virtio ring,
> as well as the (forced) swiotlb buffer.
>
> This patch kit does various changes to ensure there can be no
> access outside these two areas. It is possible for the host
> to break the communication, but this should result in a IO
> error on the guest, but no memory safety violations.
>
> virtio is quite complicated with many modes. To simplify
> the task we enforce that virtio is only in split mode without
> indirect descriptors, when running as a TDX guest. We also
> enforce use of the DMA API.
>
> Then these code paths are hardened against any corruptions
> on the ring.
>
> This patchkit has components in three subsystems:
> - Hardening changes to virtio, all in the generic virtio-ring
> - Hardening changes to kernel/dma swiotlb to harden swiotlb against
> malicious pointers. It requires an API change which needed a tree sweep.
> - A single x86 patch to enable the arch_has_restricted_memory_access
> for TDX
>
> It depends on Sathya's earlier patchkit that adds the basic infrastructure
> for TDX. This is only needed for the "am I running in TDX" part.


Note that it's probably needed by other cases as well:

1) Other encrypted VM technology
2) VDUSE[1]
3) Smart NICs

We have already had discussions and some patches have been posted[2][3][4].

I think the basic idea is similar, basically,  we don't trust any 
metadata provided by the device.

[2] is the series that use the metadata stored in the private memory 
which can't be accessed by swiotlb, this series aims to eliminate all 
the possible attacks via virtqueue metadata
[3] is one example for the the used length validation
[4] is the fix for the malicious config space

Thanks

[1] https://www.spinics.net/lists/netdev/msg743264.html
[2] https://www.spinics.net/lists/kvm/msg241825.html
[3] https://patches.linaro.org/patch/450733/
[4] https://lkml.org/lkml/2021/5/17/376

>
>
>


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Virtio hardening for TDX
@ 2021-06-03  1:34   ` Jason Wang
  0 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-03  1:34 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: x86, linux-kernel, virtualization, iommu, jpoimboe, robin.murphy, hch


在 2021/6/3 上午8:41, Andi Kleen 写道:
> [v1: Initial post]
>
> With confidential computing like TDX the guest doesn't trust the host
> anymore. The host is allowed to DOS of course, but it is not allowed
> to read or write any guest memory not explicitely shared with it.
>
> This has implication for virtio. Traditionally virtio didn't assume
> the other side of the communication channel is malicious, and therefore
> didn't do any boundary checks in virtio ring data structures.
>
> This patchkit does hardening for virtio.  In a TDX like model
> the only host memory accesses allowed are in the virtio ring,
> as well as the (forced) swiotlb buffer.
>
> This patch kit does various changes to ensure there can be no
> access outside these two areas. It is possible for the host
> to break the communication, but this should result in a IO
> error on the guest, but no memory safety violations.
>
> virtio is quite complicated with many modes. To simplify
> the task we enforce that virtio is only in split mode without
> indirect descriptors, when running as a TDX guest. We also
> enforce use of the DMA API.
>
> Then these code paths are hardened against any corruptions
> on the ring.
>
> This patchkit has components in three subsystems:
> - Hardening changes to virtio, all in the generic virtio-ring
> - Hardening changes to kernel/dma swiotlb to harden swiotlb against
> malicious pointers. It requires an API change which needed a tree sweep.
> - A single x86 patch to enable the arch_has_restricted_memory_access
> for TDX
>
> It depends on Sathya's earlier patchkit that adds the basic infrastructure
> for TDX. This is only needed for the "am I running in TDX" part.


Note that it's probably needed by other cases as well:

1) Other encrypted VM technology
2) VDUSE[1]
3) Smart NICs

We have already had discussions and some patches have been posted[2][3][4].

I think the basic idea is similar, basically,  we don't trust any 
metadata provided by the device.

[2] is the series that use the metadata stored in the private memory 
which can't be accessed by swiotlb, this series aims to eliminate all 
the possible attacks via virtqueue metadata
[3] is one example for the the used length validation
[4] is the fix for the malicious config space

Thanks

[1] https://www.spinics.net/lists/netdev/msg743264.html
[2] https://www.spinics.net/lists/kvm/msg241825.html
[3] https://patches.linaro.org/patch/450733/
[4] https://lkml.org/lkml/2021/5/17/376

>
>
>

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Virtio hardening for TDX
@ 2021-06-03  1:34   ` Jason Wang
  0 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-03  1:34 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: sathyanarayanan.kuppuswamy, x86, linux-kernel, virtualization,
	iommu, jpoimboe, robin.murphy, hch, m.szyprowski


在 2021/6/3 上午8:41, Andi Kleen 写道:
> [v1: Initial post]
>
> With confidential computing like TDX the guest doesn't trust the host
> anymore. The host is allowed to DOS of course, but it is not allowed
> to read or write any guest memory not explicitely shared with it.
>
> This has implication for virtio. Traditionally virtio didn't assume
> the other side of the communication channel is malicious, and therefore
> didn't do any boundary checks in virtio ring data structures.
>
> This patchkit does hardening for virtio.  In a TDX like model
> the only host memory accesses allowed are in the virtio ring,
> as well as the (forced) swiotlb buffer.
>
> This patch kit does various changes to ensure there can be no
> access outside these two areas. It is possible for the host
> to break the communication, but this should result in a IO
> error on the guest, but no memory safety violations.
>
> virtio is quite complicated with many modes. To simplify
> the task we enforce that virtio is only in split mode without
> indirect descriptors, when running as a TDX guest. We also
> enforce use of the DMA API.
>
> Then these code paths are hardened against any corruptions
> on the ring.
>
> This patchkit has components in three subsystems:
> - Hardening changes to virtio, all in the generic virtio-ring
> - Hardening changes to kernel/dma swiotlb to harden swiotlb against
> malicious pointers. It requires an API change which needed a tree sweep.
> - A single x86 patch to enable the arch_has_restricted_memory_access
> for TDX
>
> It depends on Sathya's earlier patchkit that adds the basic infrastructure
> for TDX. This is only needed for the "am I running in TDX" part.


Note that it's probably needed by other cases as well:

1) Other encrypted VM technology
2) VDUSE[1]
3) Smart NICs

We have already had discussions and some patches have been posted[2][3][4].

I think the basic idea is similar, basically,  we don't trust any 
metadata provided by the device.

[2] is the series that use the metadata stored in the private memory 
which can't be accessed by swiotlb, this series aims to eliminate all 
the possible attacks via virtqueue metadata
[3] is one example for the the used length validation
[4] is the fix for the malicious config space

Thanks

[1] https://www.spinics.net/lists/netdev/msg743264.html
[2] https://www.spinics.net/lists/kvm/msg241825.html
[3] https://patches.linaro.org/patch/450733/
[4] https://lkml.org/lkml/2021/5/17/376

>
>
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
  2021-06-03  0:41   ` Andi Kleen
  (?)
@ 2021-06-03  1:36     ` Jason Wang
  -1 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-03  1:36 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: virtualization, hch, m.szyprowski, robin.murphy, iommu, x86,
	sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel


在 2021/6/3 上午8:41, Andi Kleen 写道:
> When running under TDX the virtio host is untrusted. The bulk
> of the kernel memory is encrypted and protected, but the virtio
> ring is in special shared memory that is shared with the
> untrusted host.
>
> This means virtio needs to be hardened against any attacks from
> the host through the ring. Of course it's impossible to prevent DOS
> (the host can chose at any time to stop doing IO), but there
> should be no buffer overruns or similar that might give access to
> any private memory in the guest.
>
> virtio has a lot of modes, most are difficult to harden.
>
> The best for hardening seems to be split mode without indirect
> descriptors. This also simplifies the hardening job because
> it's only a single code path.
>
> Only allow split mode when in a protected guest. Followon
> patches harden the split mode code paths, and we don't want
> an malicious host to force anything else. Also disallow
> indirect mode for similar reasons.
>
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>   drivers/virtio/virtio_ring.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 71e16b53e9c1..f35629fa47b1 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -11,6 +11,7 @@
>   #include <linux/module.h>
>   #include <linux/hrtimer.h>
>   #include <linux/dma-mapping.h>
> +#include <linux/protected_guest.h>
>   #include <xen/xen.h>
>   
>   #ifdef DEBUG
> @@ -2221,8 +2222,16 @@ void vring_transport_features(struct virtio_device *vdev)
>   	unsigned int i;
>   
>   	for (i = VIRTIO_TRANSPORT_F_START; i < VIRTIO_TRANSPORT_F_END; i++) {
> +
> +		/*
> +		 * In protected guest mode disallow packed or indirect
> +		 * because they ain't hardened.
> +		 */
> +
>   		switch (i) {
>   		case VIRTIO_RING_F_INDIRECT_DESC:
> +			if (protected_guest_has(VM_MEM_ENCRYPT))
> +				goto clear;


So we will see huge performance regression without indirect descriptor. 
We need to consider to address this.

Thanks


>   			break;
>   		case VIRTIO_RING_F_EVENT_IDX:
>   			break;
> @@ -2231,9 +2240,12 @@ void vring_transport_features(struct virtio_device *vdev)
>   		case VIRTIO_F_ACCESS_PLATFORM:
>   			break;
>   		case VIRTIO_F_RING_PACKED:
> +			if (protected_guest_has(VM_MEM_ENCRYPT))
> +				goto clear;
>   			break;
>   		case VIRTIO_F_ORDER_PLATFORM:
>   			break;
> +		clear:
>   		default:
>   			/* We don't understand this bit. */
>   			__virtio_clear_bit(vdev, i);


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-03  1:36     ` Jason Wang
  0 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-03  1:36 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: x86, linux-kernel, virtualization, iommu, jpoimboe, robin.murphy, hch


在 2021/6/3 上午8:41, Andi Kleen 写道:
> When running under TDX the virtio host is untrusted. The bulk
> of the kernel memory is encrypted and protected, but the virtio
> ring is in special shared memory that is shared with the
> untrusted host.
>
> This means virtio needs to be hardened against any attacks from
> the host through the ring. Of course it's impossible to prevent DOS
> (the host can chose at any time to stop doing IO), but there
> should be no buffer overruns or similar that might give access to
> any private memory in the guest.
>
> virtio has a lot of modes, most are difficult to harden.
>
> The best for hardening seems to be split mode without indirect
> descriptors. This also simplifies the hardening job because
> it's only a single code path.
>
> Only allow split mode when in a protected guest. Followon
> patches harden the split mode code paths, and we don't want
> an malicious host to force anything else. Also disallow
> indirect mode for similar reasons.
>
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>   drivers/virtio/virtio_ring.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 71e16b53e9c1..f35629fa47b1 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -11,6 +11,7 @@
>   #include <linux/module.h>
>   #include <linux/hrtimer.h>
>   #include <linux/dma-mapping.h>
> +#include <linux/protected_guest.h>
>   #include <xen/xen.h>
>   
>   #ifdef DEBUG
> @@ -2221,8 +2222,16 @@ void vring_transport_features(struct virtio_device *vdev)
>   	unsigned int i;
>   
>   	for (i = VIRTIO_TRANSPORT_F_START; i < VIRTIO_TRANSPORT_F_END; i++) {
> +
> +		/*
> +		 * In protected guest mode disallow packed or indirect
> +		 * because they ain't hardened.
> +		 */
> +
>   		switch (i) {
>   		case VIRTIO_RING_F_INDIRECT_DESC:
> +			if (protected_guest_has(VM_MEM_ENCRYPT))
> +				goto clear;


So we will see huge performance regression without indirect descriptor. 
We need to consider to address this.

Thanks


>   			break;
>   		case VIRTIO_RING_F_EVENT_IDX:
>   			break;
> @@ -2231,9 +2240,12 @@ void vring_transport_features(struct virtio_device *vdev)
>   		case VIRTIO_F_ACCESS_PLATFORM:
>   			break;
>   		case VIRTIO_F_RING_PACKED:
> +			if (protected_guest_has(VM_MEM_ENCRYPT))
> +				goto clear;
>   			break;
>   		case VIRTIO_F_ORDER_PLATFORM:
>   			break;
> +		clear:
>   		default:
>   			/* We don't understand this bit. */
>   			__virtio_clear_bit(vdev, i);

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-03  1:36     ` Jason Wang
  0 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-03  1:36 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: sathyanarayanan.kuppuswamy, x86, linux-kernel, virtualization,
	iommu, jpoimboe, robin.murphy, hch, m.szyprowski


在 2021/6/3 上午8:41, Andi Kleen 写道:
> When running under TDX the virtio host is untrusted. The bulk
> of the kernel memory is encrypted and protected, but the virtio
> ring is in special shared memory that is shared with the
> untrusted host.
>
> This means virtio needs to be hardened against any attacks from
> the host through the ring. Of course it's impossible to prevent DOS
> (the host can chose at any time to stop doing IO), but there
> should be no buffer overruns or similar that might give access to
> any private memory in the guest.
>
> virtio has a lot of modes, most are difficult to harden.
>
> The best for hardening seems to be split mode without indirect
> descriptors. This also simplifies the hardening job because
> it's only a single code path.
>
> Only allow split mode when in a protected guest. Followon
> patches harden the split mode code paths, and we don't want
> an malicious host to force anything else. Also disallow
> indirect mode for similar reasons.
>
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>   drivers/virtio/virtio_ring.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 71e16b53e9c1..f35629fa47b1 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -11,6 +11,7 @@
>   #include <linux/module.h>
>   #include <linux/hrtimer.h>
>   #include <linux/dma-mapping.h>
> +#include <linux/protected_guest.h>
>   #include <xen/xen.h>
>   
>   #ifdef DEBUG
> @@ -2221,8 +2222,16 @@ void vring_transport_features(struct virtio_device *vdev)
>   	unsigned int i;
>   
>   	for (i = VIRTIO_TRANSPORT_F_START; i < VIRTIO_TRANSPORT_F_END; i++) {
> +
> +		/*
> +		 * In protected guest mode disallow packed or indirect
> +		 * because they ain't hardened.
> +		 */
> +
>   		switch (i) {
>   		case VIRTIO_RING_F_INDIRECT_DESC:
> +			if (protected_guest_has(VM_MEM_ENCRYPT))
> +				goto clear;


So we will see huge performance regression without indirect descriptor. 
We need to consider to address this.

Thanks


>   			break;
>   		case VIRTIO_RING_F_EVENT_IDX:
>   			break;
> @@ -2231,9 +2240,12 @@ void vring_transport_features(struct virtio_device *vdev)
>   		case VIRTIO_F_ACCESS_PLATFORM:
>   			break;
>   		case VIRTIO_F_RING_PACKED:
> +			if (protected_guest_has(VM_MEM_ENCRYPT))
> +				goto clear;
>   			break;
>   		case VIRTIO_F_ORDER_PLATFORM:
>   			break;
> +		clear:
>   		default:
>   			/* We don't understand this bit. */
>   			__virtio_clear_bit(vdev, i);

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
  2021-06-03  1:36     ` Jason Wang
  (?)
@ 2021-06-03  1:48       ` Andi Kleen
  -1 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  1:48 UTC (permalink / raw)
  To: Jason Wang, mst
  Cc: virtualization, hch, m.szyprowski, robin.murphy, iommu, x86,
	sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel


> So we will see huge performance regression without indirect 
> descriptor. We need to consider to address this.

A regression would be when some existing case would be slower.

That's not the case because the behavior for the existing cases does not 
change.

Anyways when there are performance problems they can be addressed, but 
first is to make it secure.

-Andi


>
> Thanks
>
>
>>               break;
>>           case VIRTIO_RING_F_EVENT_IDX:
>>               break;
>> @@ -2231,9 +2240,12 @@ void vring_transport_features(struct 
>> virtio_device *vdev)
>>           case VIRTIO_F_ACCESS_PLATFORM:
>>               break;
>>           case VIRTIO_F_RING_PACKED:
>> +            if (protected_guest_has(VM_MEM_ENCRYPT))
>> +                goto clear;
>>               break;
>>           case VIRTIO_F_ORDER_PLATFORM:
>>               break;
>> +        clear:
>>           default:
>>               /* We don't understand this bit. */
>>               __virtio_clear_bit(vdev, i);
>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-03  1:48       ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  1:48 UTC (permalink / raw)
  To: Jason Wang, mst
  Cc: x86, linux-kernel, virtualization, iommu, jpoimboe, robin.murphy, hch


> So we will see huge performance regression without indirect 
> descriptor. We need to consider to address this.

A regression would be when some existing case would be slower.

That's not the case because the behavior for the existing cases does not 
change.

Anyways when there are performance problems they can be addressed, but 
first is to make it secure.

-Andi


>
> Thanks
>
>
>>               break;
>>           case VIRTIO_RING_F_EVENT_IDX:
>>               break;
>> @@ -2231,9 +2240,12 @@ void vring_transport_features(struct 
>> virtio_device *vdev)
>>           case VIRTIO_F_ACCESS_PLATFORM:
>>               break;
>>           case VIRTIO_F_RING_PACKED:
>> +            if (protected_guest_has(VM_MEM_ENCRYPT))
>> +                goto clear;
>>               break;
>>           case VIRTIO_F_ORDER_PLATFORM:
>>               break;
>> +        clear:
>>           default:
>>               /* We don't understand this bit. */
>>               __virtio_clear_bit(vdev, i);
>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-03  1:48       ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  1:48 UTC (permalink / raw)
  To: Jason Wang, mst
  Cc: sathyanarayanan.kuppuswamy, x86, linux-kernel, virtualization,
	iommu, jpoimboe, robin.murphy, hch, m.szyprowski


> So we will see huge performance regression without indirect 
> descriptor. We need to consider to address this.

A regression would be when some existing case would be slower.

That's not the case because the behavior for the existing cases does not 
change.

Anyways when there are performance problems they can be addressed, but 
first is to make it secure.

-Andi


>
> Thanks
>
>
>>               break;
>>           case VIRTIO_RING_F_EVENT_IDX:
>>               break;
>> @@ -2231,9 +2240,12 @@ void vring_transport_features(struct 
>> virtio_device *vdev)
>>           case VIRTIO_F_ACCESS_PLATFORM:
>>               break;
>>           case VIRTIO_F_RING_PACKED:
>> +            if (protected_guest_has(VM_MEM_ENCRYPT))
>> +                goto clear;
>>               break;
>>           case VIRTIO_F_ORDER_PLATFORM:
>>               break;
>> +        clear:
>>           default:
>>               /* We don't understand this bit. */
>>               __virtio_clear_bit(vdev, i);
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 5/8] dma: Use size for swiotlb boundary checks
  2021-06-03  0:41   ` Andi Kleen
  (?)
@ 2021-06-03  1:48     ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 116+ messages in thread
From: Konrad Rzeszutek Wilk @ 2021-06-03  1:48 UTC (permalink / raw)
  To: Andi Kleen
  Cc: mst, jasowang, virtualization, hch, m.szyprowski, robin.murphy,
	iommu, x86, sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel

On Wed, Jun 02, 2021 at 05:41:30PM -0700, Andi Kleen wrote:
> swiotlb currently only uses the start address of a DMA to check if something
> is in the swiotlb or not. But with virtio and untrusted hosts the host
> could give some DMA mapping that crosses the swiotlb boundaries,
> potentially leaking or corrupting data. Add size checks to all the swiotlb
> checks and reject any DMAs that cross the swiotlb buffer boundaries.

I seem to be only CC-ed on this and #7, so please bear with me.

But could you explain to me why please:

commit daf9514fd5eb098d7d6f3a1247cb8cc48fc94155 (swiotlb/stable/for-linus-5.12)
Author: Martin Radev <martin.b.radev@gmail.com>
Date:   Tue Jan 12 16:07:29 2021 +0100

    swiotlb: Validate bounce size in the sync/unmap path

does not solve the problem as well?

> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  drivers/iommu/dma-iommu.c   | 13 ++++++-------
>  drivers/xen/swiotlb-xen.c   | 11 ++++++-----
>  include/linux/dma-mapping.h |  4 ++--
>  include/linux/swiotlb.h     |  8 +++++---
>  kernel/dma/direct.c         |  8 ++++----
>  kernel/dma/direct.h         |  8 ++++----
>  kernel/dma/mapping.c        |  4 ++--
>  net/xdp/xsk_buff_pool.c     |  2 +-
>  8 files changed, 30 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 7bcdd1205535..7ef13198721b 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -504,7 +504,7 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
>  
>  	__iommu_dma_unmap(dev, dma_addr, size);
>  
> -	if (unlikely(is_swiotlb_buffer(phys)))
> +	if (unlikely(is_swiotlb_buffer(phys, size)))
>  		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
>  }
>  
> @@ -575,7 +575,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device *dev, phys_addr_t phys,
>  	}
>  
>  	iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
> -	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
> +	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys, org_size))
>  		swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
>  	return iova;
>  }
> @@ -781,7 +781,7 @@ static void iommu_dma_sync_single_for_cpu(struct device *dev,
>  	if (!dev_is_dma_coherent(dev))
>  		arch_sync_dma_for_cpu(phys, size, dir);
>  
> -	if (is_swiotlb_buffer(phys))
> +	if (is_swiotlb_buffer(phys, size))
>  		swiotlb_sync_single_for_cpu(dev, phys, size, dir);
>  }
>  
> @@ -794,7 +794,7 @@ static void iommu_dma_sync_single_for_device(struct device *dev,
>  		return;
>  
>  	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
> -	if (is_swiotlb_buffer(phys))
> +	if (is_swiotlb_buffer(phys, size))
>  		swiotlb_sync_single_for_device(dev, phys, size, dir);
>  
>  	if (!dev_is_dma_coherent(dev))
> @@ -815,7 +815,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
>  		if (!dev_is_dma_coherent(dev))
>  			arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
>  
> -		if (is_swiotlb_buffer(sg_phys(sg)))
> +		if (is_swiotlb_buffer(sg_phys(sg), sg->length))
>  			swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
>  						    sg->length, dir);
>  	}
> @@ -832,10 +832,9 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
>  		return;
>  
>  	for_each_sg(sgl, sg, nelems, i) {
> -		if (is_swiotlb_buffer(sg_phys(sg)))
> +		if (is_swiotlb_buffer(sg_phys(sg), sg->length))
>  			swiotlb_sync_single_for_device(dev, sg_phys(sg),
>  						       sg->length, dir);
> -
>  		if (!dev_is_dma_coherent(dev))
>  			arch_sync_dma_for_device(sg_phys(sg), sg->length, dir);
>  	}
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 24d11861ac7d..333846af8d35 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -89,7 +89,8 @@ static inline int range_straddles_page_boundary(phys_addr_t p, size_t size)
>  	return 0;
>  }
>  
> -static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
> +static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr,
> +				 size_t size)
>  {
>  	unsigned long bfn = XEN_PFN_DOWN(dma_to_phys(dev, dma_addr));
>  	unsigned long xen_pfn = bfn_to_local_pfn(bfn);
> @@ -100,7 +101,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
>  	 * in our domain. Therefore _only_ check address within our domain.
>  	 */
>  	if (pfn_valid(PFN_DOWN(paddr)))
> -		return is_swiotlb_buffer(paddr);
> +		return is_swiotlb_buffer(paddr, size);
>  	return 0;
>  }
>  
> @@ -431,7 +432,7 @@ static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
>  	}
>  
>  	/* NOTE: We use dev_addr here, not paddr! */
> -	if (is_xen_swiotlb_buffer(hwdev, dev_addr))
> +	if (is_xen_swiotlb_buffer(hwdev, dev_addr, size))
>  		swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs);
>  }
>  
> @@ -448,7 +449,7 @@ xen_swiotlb_sync_single_for_cpu(struct device *dev, dma_addr_t dma_addr,
>  			xen_dma_sync_for_cpu(dev, dma_addr, size, dir);
>  	}
>  
> -	if (is_xen_swiotlb_buffer(dev, dma_addr))
> +	if (is_xen_swiotlb_buffer(dev, dma_addr, size))
>  		swiotlb_sync_single_for_cpu(dev, paddr, size, dir);
>  }
>  
> @@ -458,7 +459,7 @@ xen_swiotlb_sync_single_for_device(struct device *dev, dma_addr_t dma_addr,
>  {
>  	phys_addr_t paddr = xen_dma_to_phys(dev, dma_addr);
>  
> -	if (is_xen_swiotlb_buffer(dev, dma_addr))
> +	if (is_xen_swiotlb_buffer(dev, dma_addr, size))
>  		swiotlb_sync_single_for_device(dev, paddr, size, dir);
>  
>  	if (!dev_is_dma_coherent(dev)) {
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 183e7103a66d..37fbd12bd4ab 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -142,7 +142,7 @@ int dma_set_mask(struct device *dev, u64 mask);
>  int dma_set_coherent_mask(struct device *dev, u64 mask);
>  u64 dma_get_required_mask(struct device *dev);
>  size_t dma_max_mapping_size(struct device *dev);
> -bool dma_need_sync(struct device *dev, dma_addr_t dma_addr);
> +bool dma_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size);
>  unsigned long dma_get_merge_boundary(struct device *dev);
>  struct sg_table *dma_alloc_noncontiguous(struct device *dev, size_t size,
>  		enum dma_data_direction dir, gfp_t gfp, unsigned long attrs);
> @@ -258,7 +258,7 @@ static inline size_t dma_max_mapping_size(struct device *dev)
>  {
>  	return 0;
>  }
> -static inline bool dma_need_sync(struct device *dev, dma_addr_t dma_addr)
> +static inline bool dma_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size)
>  {
>  	return false;
>  }
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 216854a5e513..3e447f722d81 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -101,11 +101,13 @@ struct io_tlb_mem {
>  };
>  extern struct io_tlb_mem *io_tlb_default_mem;
>  
> -static inline bool is_swiotlb_buffer(phys_addr_t paddr)
> +static inline bool is_swiotlb_buffer(phys_addr_t paddr, size_t size)
>  {
>  	struct io_tlb_mem *mem = io_tlb_default_mem;
>  
> -	return mem && paddr >= mem->start && paddr < mem->end;
> +	if (paddr + size <= paddr) /* wrapping */
> +		return false;
> +	return mem && paddr >= mem->start && paddr + size <= mem->end;
>  }
>  
>  void __init swiotlb_exit(void);
> @@ -115,7 +117,7 @@ bool is_swiotlb_active(void);
>  void __init swiotlb_adjust_size(unsigned long size);
>  #else
>  #define swiotlb_force SWIOTLB_NO_FORCE
> -static inline bool is_swiotlb_buffer(phys_addr_t paddr)
> +static inline bool is_swiotlb_buffer(phys_addr_t paddr, size_t size)
>  {
>  	return false;
>  }
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index f737e3347059..9ae6f94e868f 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -343,7 +343,7 @@ void dma_direct_sync_sg_for_device(struct device *dev,
>  	for_each_sg(sgl, sg, nents, i) {
>  		phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg));
>  
> -		if (unlikely(is_swiotlb_buffer(paddr)))
> +		if (unlikely(is_swiotlb_buffer(paddr, sg->length)))
>  			swiotlb_sync_single_for_device(dev, paddr, sg->length,
>  						       dir);
>  
> @@ -369,7 +369,7 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
>  		if (!dev_is_dma_coherent(dev))
>  			arch_sync_dma_for_cpu(paddr, sg->length, dir);
>  
> -		if (unlikely(is_swiotlb_buffer(paddr)))
> +		if (unlikely(is_swiotlb_buffer(paddr, sg->length)))
>  			swiotlb_sync_single_for_cpu(dev, paddr, sg->length,
>  						    dir);
>  
> @@ -501,10 +501,10 @@ size_t dma_direct_max_mapping_size(struct device *dev)
>  	return SIZE_MAX;
>  }
>  
> -bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr)
> +bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size)
>  {
>  	return !dev_is_dma_coherent(dev) ||
> -		is_swiotlb_buffer(dma_to_phys(dev, dma_addr));
> +		is_swiotlb_buffer(dma_to_phys(dev, dma_addr), size);
>  }
>  
>  /**
> diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
> index 50afc05b6f1d..4a17e431ae56 100644
> --- a/kernel/dma/direct.h
> +++ b/kernel/dma/direct.h
> @@ -16,7 +16,7 @@ bool dma_direct_can_mmap(struct device *dev);
>  int dma_direct_mmap(struct device *dev, struct vm_area_struct *vma,
>  		void *cpu_addr, dma_addr_t dma_addr, size_t size,
>  		unsigned long attrs);
> -bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr);
> +bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size);
>  int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
>  		enum dma_data_direction dir, unsigned long attrs);
>  size_t dma_direct_max_mapping_size(struct device *dev);
> @@ -56,7 +56,7 @@ static inline void dma_direct_sync_single_for_device(struct device *dev,
>  {
>  	phys_addr_t paddr = dma_to_phys(dev, addr);
>  
> -	if (unlikely(is_swiotlb_buffer(paddr)))
> +	if (unlikely(is_swiotlb_buffer(paddr, size)))
>  		swiotlb_sync_single_for_device(dev, paddr, size, dir);
>  
>  	if (!dev_is_dma_coherent(dev))
> @@ -73,7 +73,7 @@ static inline void dma_direct_sync_single_for_cpu(struct device *dev,
>  		arch_sync_dma_for_cpu_all();
>  	}
>  
> -	if (unlikely(is_swiotlb_buffer(paddr)))
> +	if (unlikely(is_swiotlb_buffer(paddr, size)))
>  		swiotlb_sync_single_for_cpu(dev, paddr, size, dir);
>  
>  	if (dir == DMA_FROM_DEVICE)
> @@ -113,7 +113,7 @@ static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
>  	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
>  		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
>  
> -	if (unlikely(is_swiotlb_buffer(phys)))
> +	if (unlikely(is_swiotlb_buffer(phys, size)))
>  		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
>  }
>  #endif /* _KERNEL_DMA_DIRECT_H */
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 2b06a809d0b9..9bf02c8d7d1b 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -716,12 +716,12 @@ size_t dma_max_mapping_size(struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(dma_max_mapping_size);
>  
> -bool dma_need_sync(struct device *dev, dma_addr_t dma_addr)
> +bool dma_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size)
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	if (dma_map_direct(dev, ops))
> -		return dma_direct_need_sync(dev, dma_addr);
> +		return dma_direct_need_sync(dev, dma_addr, size);
>  	return ops->sync_single_for_cpu || ops->sync_single_for_device;
>  }
>  EXPORT_SYMBOL_GPL(dma_need_sync);
> diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> index 8de01aaac4a0..c1e404fe0cf4 100644
> --- a/net/xdp/xsk_buff_pool.c
> +++ b/net/xdp/xsk_buff_pool.c
> @@ -399,7 +399,7 @@ int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
>  			__xp_dma_unmap(dma_map, attrs);
>  			return -ENOMEM;
>  		}
> -		if (dma_need_sync(dev, dma))
> +		if (dma_need_sync(dev, dma, PAGE_SIZE))
>  			dma_map->dma_need_sync = true;
>  		dma_map->dma_pages[i] = dma;
>  	}
> -- 
> 2.25.4
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 5/8] dma: Use size for swiotlb boundary checks
@ 2021-06-03  1:48     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 116+ messages in thread
From: Konrad Rzeszutek Wilk @ 2021-06-03  1:48 UTC (permalink / raw)
  To: Andi Kleen
  Cc: mst, jasowang, x86, linux-kernel, virtualization, iommu,
	jpoimboe, robin.murphy, hch

On Wed, Jun 02, 2021 at 05:41:30PM -0700, Andi Kleen wrote:
> swiotlb currently only uses the start address of a DMA to check if something
> is in the swiotlb or not. But with virtio and untrusted hosts the host
> could give some DMA mapping that crosses the swiotlb boundaries,
> potentially leaking or corrupting data. Add size checks to all the swiotlb
> checks and reject any DMAs that cross the swiotlb buffer boundaries.

I seem to be only CC-ed on this and #7, so please bear with me.

But could you explain to me why please:

commit daf9514fd5eb098d7d6f3a1247cb8cc48fc94155 (swiotlb/stable/for-linus-5.12)
Author: Martin Radev <martin.b.radev@gmail.com>
Date:   Tue Jan 12 16:07:29 2021 +0100

    swiotlb: Validate bounce size in the sync/unmap path

does not solve the problem as well?

> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  drivers/iommu/dma-iommu.c   | 13 ++++++-------
>  drivers/xen/swiotlb-xen.c   | 11 ++++++-----
>  include/linux/dma-mapping.h |  4 ++--
>  include/linux/swiotlb.h     |  8 +++++---
>  kernel/dma/direct.c         |  8 ++++----
>  kernel/dma/direct.h         |  8 ++++----
>  kernel/dma/mapping.c        |  4 ++--
>  net/xdp/xsk_buff_pool.c     |  2 +-
>  8 files changed, 30 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 7bcdd1205535..7ef13198721b 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -504,7 +504,7 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
>  
>  	__iommu_dma_unmap(dev, dma_addr, size);
>  
> -	if (unlikely(is_swiotlb_buffer(phys)))
> +	if (unlikely(is_swiotlb_buffer(phys, size)))
>  		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
>  }
>  
> @@ -575,7 +575,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device *dev, phys_addr_t phys,
>  	}
>  
>  	iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
> -	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
> +	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys, org_size))
>  		swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
>  	return iova;
>  }
> @@ -781,7 +781,7 @@ static void iommu_dma_sync_single_for_cpu(struct device *dev,
>  	if (!dev_is_dma_coherent(dev))
>  		arch_sync_dma_for_cpu(phys, size, dir);
>  
> -	if (is_swiotlb_buffer(phys))
> +	if (is_swiotlb_buffer(phys, size))
>  		swiotlb_sync_single_for_cpu(dev, phys, size, dir);
>  }
>  
> @@ -794,7 +794,7 @@ static void iommu_dma_sync_single_for_device(struct device *dev,
>  		return;
>  
>  	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
> -	if (is_swiotlb_buffer(phys))
> +	if (is_swiotlb_buffer(phys, size))
>  		swiotlb_sync_single_for_device(dev, phys, size, dir);
>  
>  	if (!dev_is_dma_coherent(dev))
> @@ -815,7 +815,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
>  		if (!dev_is_dma_coherent(dev))
>  			arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
>  
> -		if (is_swiotlb_buffer(sg_phys(sg)))
> +		if (is_swiotlb_buffer(sg_phys(sg), sg->length))
>  			swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
>  						    sg->length, dir);
>  	}
> @@ -832,10 +832,9 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
>  		return;
>  
>  	for_each_sg(sgl, sg, nelems, i) {
> -		if (is_swiotlb_buffer(sg_phys(sg)))
> +		if (is_swiotlb_buffer(sg_phys(sg), sg->length))
>  			swiotlb_sync_single_for_device(dev, sg_phys(sg),
>  						       sg->length, dir);
> -
>  		if (!dev_is_dma_coherent(dev))
>  			arch_sync_dma_for_device(sg_phys(sg), sg->length, dir);
>  	}
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 24d11861ac7d..333846af8d35 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -89,7 +89,8 @@ static inline int range_straddles_page_boundary(phys_addr_t p, size_t size)
>  	return 0;
>  }
>  
> -static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
> +static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr,
> +				 size_t size)
>  {
>  	unsigned long bfn = XEN_PFN_DOWN(dma_to_phys(dev, dma_addr));
>  	unsigned long xen_pfn = bfn_to_local_pfn(bfn);
> @@ -100,7 +101,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
>  	 * in our domain. Therefore _only_ check address within our domain.
>  	 */
>  	if (pfn_valid(PFN_DOWN(paddr)))
> -		return is_swiotlb_buffer(paddr);
> +		return is_swiotlb_buffer(paddr, size);
>  	return 0;
>  }
>  
> @@ -431,7 +432,7 @@ static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
>  	}
>  
>  	/* NOTE: We use dev_addr here, not paddr! */
> -	if (is_xen_swiotlb_buffer(hwdev, dev_addr))
> +	if (is_xen_swiotlb_buffer(hwdev, dev_addr, size))
>  		swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs);
>  }
>  
> @@ -448,7 +449,7 @@ xen_swiotlb_sync_single_for_cpu(struct device *dev, dma_addr_t dma_addr,
>  			xen_dma_sync_for_cpu(dev, dma_addr, size, dir);
>  	}
>  
> -	if (is_xen_swiotlb_buffer(dev, dma_addr))
> +	if (is_xen_swiotlb_buffer(dev, dma_addr, size))
>  		swiotlb_sync_single_for_cpu(dev, paddr, size, dir);
>  }
>  
> @@ -458,7 +459,7 @@ xen_swiotlb_sync_single_for_device(struct device *dev, dma_addr_t dma_addr,
>  {
>  	phys_addr_t paddr = xen_dma_to_phys(dev, dma_addr);
>  
> -	if (is_xen_swiotlb_buffer(dev, dma_addr))
> +	if (is_xen_swiotlb_buffer(dev, dma_addr, size))
>  		swiotlb_sync_single_for_device(dev, paddr, size, dir);
>  
>  	if (!dev_is_dma_coherent(dev)) {
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 183e7103a66d..37fbd12bd4ab 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -142,7 +142,7 @@ int dma_set_mask(struct device *dev, u64 mask);
>  int dma_set_coherent_mask(struct device *dev, u64 mask);
>  u64 dma_get_required_mask(struct device *dev);
>  size_t dma_max_mapping_size(struct device *dev);
> -bool dma_need_sync(struct device *dev, dma_addr_t dma_addr);
> +bool dma_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size);
>  unsigned long dma_get_merge_boundary(struct device *dev);
>  struct sg_table *dma_alloc_noncontiguous(struct device *dev, size_t size,
>  		enum dma_data_direction dir, gfp_t gfp, unsigned long attrs);
> @@ -258,7 +258,7 @@ static inline size_t dma_max_mapping_size(struct device *dev)
>  {
>  	return 0;
>  }
> -static inline bool dma_need_sync(struct device *dev, dma_addr_t dma_addr)
> +static inline bool dma_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size)
>  {
>  	return false;
>  }
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 216854a5e513..3e447f722d81 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -101,11 +101,13 @@ struct io_tlb_mem {
>  };
>  extern struct io_tlb_mem *io_tlb_default_mem;
>  
> -static inline bool is_swiotlb_buffer(phys_addr_t paddr)
> +static inline bool is_swiotlb_buffer(phys_addr_t paddr, size_t size)
>  {
>  	struct io_tlb_mem *mem = io_tlb_default_mem;
>  
> -	return mem && paddr >= mem->start && paddr < mem->end;
> +	if (paddr + size <= paddr) /* wrapping */
> +		return false;
> +	return mem && paddr >= mem->start && paddr + size <= mem->end;
>  }
>  
>  void __init swiotlb_exit(void);
> @@ -115,7 +117,7 @@ bool is_swiotlb_active(void);
>  void __init swiotlb_adjust_size(unsigned long size);
>  #else
>  #define swiotlb_force SWIOTLB_NO_FORCE
> -static inline bool is_swiotlb_buffer(phys_addr_t paddr)
> +static inline bool is_swiotlb_buffer(phys_addr_t paddr, size_t size)
>  {
>  	return false;
>  }
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index f737e3347059..9ae6f94e868f 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -343,7 +343,7 @@ void dma_direct_sync_sg_for_device(struct device *dev,
>  	for_each_sg(sgl, sg, nents, i) {
>  		phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg));
>  
> -		if (unlikely(is_swiotlb_buffer(paddr)))
> +		if (unlikely(is_swiotlb_buffer(paddr, sg->length)))
>  			swiotlb_sync_single_for_device(dev, paddr, sg->length,
>  						       dir);
>  
> @@ -369,7 +369,7 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
>  		if (!dev_is_dma_coherent(dev))
>  			arch_sync_dma_for_cpu(paddr, sg->length, dir);
>  
> -		if (unlikely(is_swiotlb_buffer(paddr)))
> +		if (unlikely(is_swiotlb_buffer(paddr, sg->length)))
>  			swiotlb_sync_single_for_cpu(dev, paddr, sg->length,
>  						    dir);
>  
> @@ -501,10 +501,10 @@ size_t dma_direct_max_mapping_size(struct device *dev)
>  	return SIZE_MAX;
>  }
>  
> -bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr)
> +bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size)
>  {
>  	return !dev_is_dma_coherent(dev) ||
> -		is_swiotlb_buffer(dma_to_phys(dev, dma_addr));
> +		is_swiotlb_buffer(dma_to_phys(dev, dma_addr), size);
>  }
>  
>  /**
> diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
> index 50afc05b6f1d..4a17e431ae56 100644
> --- a/kernel/dma/direct.h
> +++ b/kernel/dma/direct.h
> @@ -16,7 +16,7 @@ bool dma_direct_can_mmap(struct device *dev);
>  int dma_direct_mmap(struct device *dev, struct vm_area_struct *vma,
>  		void *cpu_addr, dma_addr_t dma_addr, size_t size,
>  		unsigned long attrs);
> -bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr);
> +bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size);
>  int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
>  		enum dma_data_direction dir, unsigned long attrs);
>  size_t dma_direct_max_mapping_size(struct device *dev);
> @@ -56,7 +56,7 @@ static inline void dma_direct_sync_single_for_device(struct device *dev,
>  {
>  	phys_addr_t paddr = dma_to_phys(dev, addr);
>  
> -	if (unlikely(is_swiotlb_buffer(paddr)))
> +	if (unlikely(is_swiotlb_buffer(paddr, size)))
>  		swiotlb_sync_single_for_device(dev, paddr, size, dir);
>  
>  	if (!dev_is_dma_coherent(dev))
> @@ -73,7 +73,7 @@ static inline void dma_direct_sync_single_for_cpu(struct device *dev,
>  		arch_sync_dma_for_cpu_all();
>  	}
>  
> -	if (unlikely(is_swiotlb_buffer(paddr)))
> +	if (unlikely(is_swiotlb_buffer(paddr, size)))
>  		swiotlb_sync_single_for_cpu(dev, paddr, size, dir);
>  
>  	if (dir == DMA_FROM_DEVICE)
> @@ -113,7 +113,7 @@ static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
>  	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
>  		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
>  
> -	if (unlikely(is_swiotlb_buffer(phys)))
> +	if (unlikely(is_swiotlb_buffer(phys, size)))
>  		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
>  }
>  #endif /* _KERNEL_DMA_DIRECT_H */
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 2b06a809d0b9..9bf02c8d7d1b 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -716,12 +716,12 @@ size_t dma_max_mapping_size(struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(dma_max_mapping_size);
>  
> -bool dma_need_sync(struct device *dev, dma_addr_t dma_addr)
> +bool dma_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size)
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	if (dma_map_direct(dev, ops))
> -		return dma_direct_need_sync(dev, dma_addr);
> +		return dma_direct_need_sync(dev, dma_addr, size);
>  	return ops->sync_single_for_cpu || ops->sync_single_for_device;
>  }
>  EXPORT_SYMBOL_GPL(dma_need_sync);
> diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> index 8de01aaac4a0..c1e404fe0cf4 100644
> --- a/net/xdp/xsk_buff_pool.c
> +++ b/net/xdp/xsk_buff_pool.c
> @@ -399,7 +399,7 @@ int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
>  			__xp_dma_unmap(dma_map, attrs);
>  			return -ENOMEM;
>  		}
> -		if (dma_need_sync(dev, dma))
> +		if (dma_need_sync(dev, dma, PAGE_SIZE))
>  			dma_map->dma_need_sync = true;
>  		dma_map->dma_pages[i] = dma;
>  	}
> -- 
> 2.25.4
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 5/8] dma: Use size for swiotlb boundary checks
@ 2021-06-03  1:48     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 116+ messages in thread
From: Konrad Rzeszutek Wilk @ 2021-06-03  1:48 UTC (permalink / raw)
  To: Andi Kleen
  Cc: sathyanarayanan.kuppuswamy, mst, x86, linux-kernel,
	virtualization, iommu, jpoimboe, robin.murphy, hch, m.szyprowski

On Wed, Jun 02, 2021 at 05:41:30PM -0700, Andi Kleen wrote:
> swiotlb currently only uses the start address of a DMA to check if something
> is in the swiotlb or not. But with virtio and untrusted hosts the host
> could give some DMA mapping that crosses the swiotlb boundaries,
> potentially leaking or corrupting data. Add size checks to all the swiotlb
> checks and reject any DMAs that cross the swiotlb buffer boundaries.

I seem to be only CC-ed on this and #7, so please bear with me.

But could you explain to me why please:

commit daf9514fd5eb098d7d6f3a1247cb8cc48fc94155 (swiotlb/stable/for-linus-5.12)
Author: Martin Radev <martin.b.radev@gmail.com>
Date:   Tue Jan 12 16:07:29 2021 +0100

    swiotlb: Validate bounce size in the sync/unmap path

does not solve the problem as well?

> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  drivers/iommu/dma-iommu.c   | 13 ++++++-------
>  drivers/xen/swiotlb-xen.c   | 11 ++++++-----
>  include/linux/dma-mapping.h |  4 ++--
>  include/linux/swiotlb.h     |  8 +++++---
>  kernel/dma/direct.c         |  8 ++++----
>  kernel/dma/direct.h         |  8 ++++----
>  kernel/dma/mapping.c        |  4 ++--
>  net/xdp/xsk_buff_pool.c     |  2 +-
>  8 files changed, 30 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 7bcdd1205535..7ef13198721b 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -504,7 +504,7 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
>  
>  	__iommu_dma_unmap(dev, dma_addr, size);
>  
> -	if (unlikely(is_swiotlb_buffer(phys)))
> +	if (unlikely(is_swiotlb_buffer(phys, size)))
>  		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
>  }
>  
> @@ -575,7 +575,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device *dev, phys_addr_t phys,
>  	}
>  
>  	iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
> -	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
> +	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys, org_size))
>  		swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
>  	return iova;
>  }
> @@ -781,7 +781,7 @@ static void iommu_dma_sync_single_for_cpu(struct device *dev,
>  	if (!dev_is_dma_coherent(dev))
>  		arch_sync_dma_for_cpu(phys, size, dir);
>  
> -	if (is_swiotlb_buffer(phys))
> +	if (is_swiotlb_buffer(phys, size))
>  		swiotlb_sync_single_for_cpu(dev, phys, size, dir);
>  }
>  
> @@ -794,7 +794,7 @@ static void iommu_dma_sync_single_for_device(struct device *dev,
>  		return;
>  
>  	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
> -	if (is_swiotlb_buffer(phys))
> +	if (is_swiotlb_buffer(phys, size))
>  		swiotlb_sync_single_for_device(dev, phys, size, dir);
>  
>  	if (!dev_is_dma_coherent(dev))
> @@ -815,7 +815,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
>  		if (!dev_is_dma_coherent(dev))
>  			arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
>  
> -		if (is_swiotlb_buffer(sg_phys(sg)))
> +		if (is_swiotlb_buffer(sg_phys(sg), sg->length))
>  			swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
>  						    sg->length, dir);
>  	}
> @@ -832,10 +832,9 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
>  		return;
>  
>  	for_each_sg(sgl, sg, nelems, i) {
> -		if (is_swiotlb_buffer(sg_phys(sg)))
> +		if (is_swiotlb_buffer(sg_phys(sg), sg->length))
>  			swiotlb_sync_single_for_device(dev, sg_phys(sg),
>  						       sg->length, dir);
> -
>  		if (!dev_is_dma_coherent(dev))
>  			arch_sync_dma_for_device(sg_phys(sg), sg->length, dir);
>  	}
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 24d11861ac7d..333846af8d35 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -89,7 +89,8 @@ static inline int range_straddles_page_boundary(phys_addr_t p, size_t size)
>  	return 0;
>  }
>  
> -static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
> +static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr,
> +				 size_t size)
>  {
>  	unsigned long bfn = XEN_PFN_DOWN(dma_to_phys(dev, dma_addr));
>  	unsigned long xen_pfn = bfn_to_local_pfn(bfn);
> @@ -100,7 +101,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
>  	 * in our domain. Therefore _only_ check address within our domain.
>  	 */
>  	if (pfn_valid(PFN_DOWN(paddr)))
> -		return is_swiotlb_buffer(paddr);
> +		return is_swiotlb_buffer(paddr, size);
>  	return 0;
>  }
>  
> @@ -431,7 +432,7 @@ static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
>  	}
>  
>  	/* NOTE: We use dev_addr here, not paddr! */
> -	if (is_xen_swiotlb_buffer(hwdev, dev_addr))
> +	if (is_xen_swiotlb_buffer(hwdev, dev_addr, size))
>  		swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs);
>  }
>  
> @@ -448,7 +449,7 @@ xen_swiotlb_sync_single_for_cpu(struct device *dev, dma_addr_t dma_addr,
>  			xen_dma_sync_for_cpu(dev, dma_addr, size, dir);
>  	}
>  
> -	if (is_xen_swiotlb_buffer(dev, dma_addr))
> +	if (is_xen_swiotlb_buffer(dev, dma_addr, size))
>  		swiotlb_sync_single_for_cpu(dev, paddr, size, dir);
>  }
>  
> @@ -458,7 +459,7 @@ xen_swiotlb_sync_single_for_device(struct device *dev, dma_addr_t dma_addr,
>  {
>  	phys_addr_t paddr = xen_dma_to_phys(dev, dma_addr);
>  
> -	if (is_xen_swiotlb_buffer(dev, dma_addr))
> +	if (is_xen_swiotlb_buffer(dev, dma_addr, size))
>  		swiotlb_sync_single_for_device(dev, paddr, size, dir);
>  
>  	if (!dev_is_dma_coherent(dev)) {
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 183e7103a66d..37fbd12bd4ab 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -142,7 +142,7 @@ int dma_set_mask(struct device *dev, u64 mask);
>  int dma_set_coherent_mask(struct device *dev, u64 mask);
>  u64 dma_get_required_mask(struct device *dev);
>  size_t dma_max_mapping_size(struct device *dev);
> -bool dma_need_sync(struct device *dev, dma_addr_t dma_addr);
> +bool dma_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size);
>  unsigned long dma_get_merge_boundary(struct device *dev);
>  struct sg_table *dma_alloc_noncontiguous(struct device *dev, size_t size,
>  		enum dma_data_direction dir, gfp_t gfp, unsigned long attrs);
> @@ -258,7 +258,7 @@ static inline size_t dma_max_mapping_size(struct device *dev)
>  {
>  	return 0;
>  }
> -static inline bool dma_need_sync(struct device *dev, dma_addr_t dma_addr)
> +static inline bool dma_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size)
>  {
>  	return false;
>  }
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 216854a5e513..3e447f722d81 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -101,11 +101,13 @@ struct io_tlb_mem {
>  };
>  extern struct io_tlb_mem *io_tlb_default_mem;
>  
> -static inline bool is_swiotlb_buffer(phys_addr_t paddr)
> +static inline bool is_swiotlb_buffer(phys_addr_t paddr, size_t size)
>  {
>  	struct io_tlb_mem *mem = io_tlb_default_mem;
>  
> -	return mem && paddr >= mem->start && paddr < mem->end;
> +	if (paddr + size <= paddr) /* wrapping */
> +		return false;
> +	return mem && paddr >= mem->start && paddr + size <= mem->end;
>  }
>  
>  void __init swiotlb_exit(void);
> @@ -115,7 +117,7 @@ bool is_swiotlb_active(void);
>  void __init swiotlb_adjust_size(unsigned long size);
>  #else
>  #define swiotlb_force SWIOTLB_NO_FORCE
> -static inline bool is_swiotlb_buffer(phys_addr_t paddr)
> +static inline bool is_swiotlb_buffer(phys_addr_t paddr, size_t size)
>  {
>  	return false;
>  }
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index f737e3347059..9ae6f94e868f 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -343,7 +343,7 @@ void dma_direct_sync_sg_for_device(struct device *dev,
>  	for_each_sg(sgl, sg, nents, i) {
>  		phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg));
>  
> -		if (unlikely(is_swiotlb_buffer(paddr)))
> +		if (unlikely(is_swiotlb_buffer(paddr, sg->length)))
>  			swiotlb_sync_single_for_device(dev, paddr, sg->length,
>  						       dir);
>  
> @@ -369,7 +369,7 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
>  		if (!dev_is_dma_coherent(dev))
>  			arch_sync_dma_for_cpu(paddr, sg->length, dir);
>  
> -		if (unlikely(is_swiotlb_buffer(paddr)))
> +		if (unlikely(is_swiotlb_buffer(paddr, sg->length)))
>  			swiotlb_sync_single_for_cpu(dev, paddr, sg->length,
>  						    dir);
>  
> @@ -501,10 +501,10 @@ size_t dma_direct_max_mapping_size(struct device *dev)
>  	return SIZE_MAX;
>  }
>  
> -bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr)
> +bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size)
>  {
>  	return !dev_is_dma_coherent(dev) ||
> -		is_swiotlb_buffer(dma_to_phys(dev, dma_addr));
> +		is_swiotlb_buffer(dma_to_phys(dev, dma_addr), size);
>  }
>  
>  /**
> diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
> index 50afc05b6f1d..4a17e431ae56 100644
> --- a/kernel/dma/direct.h
> +++ b/kernel/dma/direct.h
> @@ -16,7 +16,7 @@ bool dma_direct_can_mmap(struct device *dev);
>  int dma_direct_mmap(struct device *dev, struct vm_area_struct *vma,
>  		void *cpu_addr, dma_addr_t dma_addr, size_t size,
>  		unsigned long attrs);
> -bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr);
> +bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size);
>  int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
>  		enum dma_data_direction dir, unsigned long attrs);
>  size_t dma_direct_max_mapping_size(struct device *dev);
> @@ -56,7 +56,7 @@ static inline void dma_direct_sync_single_for_device(struct device *dev,
>  {
>  	phys_addr_t paddr = dma_to_phys(dev, addr);
>  
> -	if (unlikely(is_swiotlb_buffer(paddr)))
> +	if (unlikely(is_swiotlb_buffer(paddr, size)))
>  		swiotlb_sync_single_for_device(dev, paddr, size, dir);
>  
>  	if (!dev_is_dma_coherent(dev))
> @@ -73,7 +73,7 @@ static inline void dma_direct_sync_single_for_cpu(struct device *dev,
>  		arch_sync_dma_for_cpu_all();
>  	}
>  
> -	if (unlikely(is_swiotlb_buffer(paddr)))
> +	if (unlikely(is_swiotlb_buffer(paddr, size)))
>  		swiotlb_sync_single_for_cpu(dev, paddr, size, dir);
>  
>  	if (dir == DMA_FROM_DEVICE)
> @@ -113,7 +113,7 @@ static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
>  	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
>  		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
>  
> -	if (unlikely(is_swiotlb_buffer(phys)))
> +	if (unlikely(is_swiotlb_buffer(phys, size)))
>  		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
>  }
>  #endif /* _KERNEL_DMA_DIRECT_H */
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 2b06a809d0b9..9bf02c8d7d1b 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -716,12 +716,12 @@ size_t dma_max_mapping_size(struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(dma_max_mapping_size);
>  
> -bool dma_need_sync(struct device *dev, dma_addr_t dma_addr)
> +bool dma_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size)
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	if (dma_map_direct(dev, ops))
> -		return dma_direct_need_sync(dev, dma_addr);
> +		return dma_direct_need_sync(dev, dma_addr, size);
>  	return ops->sync_single_for_cpu || ops->sync_single_for_device;
>  }
>  EXPORT_SYMBOL_GPL(dma_need_sync);
> diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> index 8de01aaac4a0..c1e404fe0cf4 100644
> --- a/net/xdp/xsk_buff_pool.c
> +++ b/net/xdp/xsk_buff_pool.c
> @@ -399,7 +399,7 @@ int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
>  			__xp_dma_unmap(dma_map, attrs);
>  			return -ENOMEM;
>  		}
> -		if (dma_need_sync(dev, dma))
> +		if (dma_need_sync(dev, dma, PAGE_SIZE))
>  			dma_map->dma_need_sync = true;
>  		dma_map->dma_pages[i] = dma;
>  	}
> -- 
> 2.25.4
> 
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Virtio hardening for TDX
  2021-06-03  1:34   ` Jason Wang
  (?)
@ 2021-06-03  1:56     ` Andi Kleen
  -1 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  1:56 UTC (permalink / raw)
  To: Jason Wang, mst
  Cc: virtualization, hch, m.szyprowski, robin.murphy, iommu, x86,
	sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel


> Note that it's probably needed by other cases as well:
>
> 1) Other encrypted VM technology
> 2) VDUSE[1]
> 3) Smart NICs

Right. I don't see any reason why these shouldn't work. You may just 
need to add the enable for the lockdown, but you can reuse the basic 
infrastructure.

>
> We have already had discussions and some patches have been 
> posted[2][3][4].

Thanks.

Yes [2] is indeed an alternative. We considered this at some point, but 
since we don't care about DOS in our case it seemed simpler to just 
harden the existing code.  But yes if it's there it's useful for TDX too.

FWIW I would argue that the descriptor boundary checking should be added 
in any case, security case or separated metadata or not, because it can 
catch bugs and is very cheap. Checking boundaries is good practice.

[4] would be an independent issue, that's something we didn't catch.

Also the swiotlb hardening implemented in this patchkit doesn't seem to 
be in any of the other patches.

So I would say my patches are mostly orthogonal to these patches below 
and not conflicting, even though they address a similar problem space.

-Andi



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Virtio hardening for TDX
@ 2021-06-03  1:56     ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  1:56 UTC (permalink / raw)
  To: Jason Wang, mst
  Cc: x86, linux-kernel, virtualization, iommu, jpoimboe, robin.murphy, hch


> Note that it's probably needed by other cases as well:
>
> 1) Other encrypted VM technology
> 2) VDUSE[1]
> 3) Smart NICs

Right. I don't see any reason why these shouldn't work. You may just 
need to add the enable for the lockdown, but you can reuse the basic 
infrastructure.

>
> We have already had discussions and some patches have been 
> posted[2][3][4].

Thanks.

Yes [2] is indeed an alternative. We considered this at some point, but 
since we don't care about DOS in our case it seemed simpler to just 
harden the existing code.  But yes if it's there it's useful for TDX too.

FWIW I would argue that the descriptor boundary checking should be added 
in any case, security case or separated metadata or not, because it can 
catch bugs and is very cheap. Checking boundaries is good practice.

[4] would be an independent issue, that's something we didn't catch.

Also the swiotlb hardening implemented in this patchkit doesn't seem to 
be in any of the other patches.

So I would say my patches are mostly orthogonal to these patches below 
and not conflicting, even though they address a similar problem space.

-Andi


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: Virtio hardening for TDX
@ 2021-06-03  1:56     ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  1:56 UTC (permalink / raw)
  To: Jason Wang, mst
  Cc: sathyanarayanan.kuppuswamy, x86, linux-kernel, virtualization,
	iommu, jpoimboe, robin.murphy, hch, m.szyprowski


> Note that it's probably needed by other cases as well:
>
> 1) Other encrypted VM technology
> 2) VDUSE[1]
> 3) Smart NICs

Right. I don't see any reason why these shouldn't work. You may just 
need to add the enable for the lockdown, but you can reuse the basic 
infrastructure.

>
> We have already had discussions and some patches have been 
> posted[2][3][4].

Thanks.

Yes [2] is indeed an alternative. We considered this at some point, but 
since we don't care about DOS in our case it seemed simpler to just 
harden the existing code.  But yes if it's there it's useful for TDX too.

FWIW I would argue that the descriptor boundary checking should be added 
in any case, security case or separated metadata or not, because it can 
catch bugs and is very cheap. Checking boundaries is good practice.

[4] would be an independent issue, that's something we didn't catch.

Also the swiotlb hardening implemented in this patchkit doesn't seem to 
be in any of the other patches.

So I would say my patches are mostly orthogonal to these patches below 
and not conflicting, even though they address a similar problem space.

-Andi


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 5/8] dma: Use size for swiotlb boundary checks
  2021-06-03  1:48     ` Konrad Rzeszutek Wilk
  (?)
@ 2021-06-03  2:03       ` Andi Kleen
  -1 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  2:03 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: mst, jasowang, virtualization, hch, m.szyprowski, robin.murphy,
	iommu, x86, sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel


On 6/2/2021 6:48 PM, Konrad Rzeszutek Wilk wrote:
> On Wed, Jun 02, 2021 at 05:41:30PM -0700, Andi Kleen wrote:
>> swiotlb currently only uses the start address of a DMA to check if something
>> is in the swiotlb or not. But with virtio and untrusted hosts the host
>> could give some DMA mapping that crosses the swiotlb boundaries,
>> potentially leaking or corrupting data. Add size checks to all the swiotlb
>> checks and reject any DMAs that cross the swiotlb buffer boundaries.
> I seem to be only CC-ed on this and #7, so please bear with me.
You weren't cc'ed originally so if you get partial emails it must be 
through some list.
>
> But could you explain to me why please:
>
> commit daf9514fd5eb098d7d6f3a1247cb8cc48fc94155 (swiotlb/stable/for-linus-5.12)
> Author: Martin Radev <martin.b.radev@gmail.com>
> Date:   Tue Jan 12 16:07:29 2021 +0100
>
>      swiotlb: Validate bounce size in the sync/unmap path
>
> does not solve the problem as well?

Thanks. I missed that patch, race condition.

One major difference of my patch is that it supports an error return, 
which allows virtio to error out. This is important in virtio because 
otherwise you'll end up with uninitialized memory on the target without 
any indication. This uninitialized memory could be an potential attack 
vector on the guest memory, e.g. if the attacker finds some way to echo 
it out again.

But the error return could be added to your infrastructure too and what 
would make this patch much shorter. I'll take a look at that.

-Andi






^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 5/8] dma: Use size for swiotlb boundary checks
@ 2021-06-03  2:03       ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  2:03 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: mst, jasowang, x86, linux-kernel, virtualization, iommu,
	jpoimboe, robin.murphy, hch


On 6/2/2021 6:48 PM, Konrad Rzeszutek Wilk wrote:
> On Wed, Jun 02, 2021 at 05:41:30PM -0700, Andi Kleen wrote:
>> swiotlb currently only uses the start address of a DMA to check if something
>> is in the swiotlb or not. But with virtio and untrusted hosts the host
>> could give some DMA mapping that crosses the swiotlb boundaries,
>> potentially leaking or corrupting data. Add size checks to all the swiotlb
>> checks and reject any DMAs that cross the swiotlb buffer boundaries.
> I seem to be only CC-ed on this and #7, so please bear with me.
You weren't cc'ed originally so if you get partial emails it must be 
through some list.
>
> But could you explain to me why please:
>
> commit daf9514fd5eb098d7d6f3a1247cb8cc48fc94155 (swiotlb/stable/for-linus-5.12)
> Author: Martin Radev <martin.b.radev@gmail.com>
> Date:   Tue Jan 12 16:07:29 2021 +0100
>
>      swiotlb: Validate bounce size in the sync/unmap path
>
> does not solve the problem as well?

Thanks. I missed that patch, race condition.

One major difference of my patch is that it supports an error return, 
which allows virtio to error out. This is important in virtio because 
otherwise you'll end up with uninitialized memory on the target without 
any indication. This uninitialized memory could be an potential attack 
vector on the guest memory, e.g. if the attacker finds some way to echo 
it out again.

But the error return could be added to your infrastructure too and what 
would make this patch much shorter. I'll take a look at that.

-Andi





_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 5/8] dma: Use size for swiotlb boundary checks
@ 2021-06-03  2:03       ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  2:03 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: sathyanarayanan.kuppuswamy, mst, x86, linux-kernel,
	virtualization, iommu, jpoimboe, robin.murphy, hch, m.szyprowski


On 6/2/2021 6:48 PM, Konrad Rzeszutek Wilk wrote:
> On Wed, Jun 02, 2021 at 05:41:30PM -0700, Andi Kleen wrote:
>> swiotlb currently only uses the start address of a DMA to check if something
>> is in the swiotlb or not. But with virtio and untrusted hosts the host
>> could give some DMA mapping that crosses the swiotlb boundaries,
>> potentially leaking or corrupting data. Add size checks to all the swiotlb
>> checks and reject any DMAs that cross the swiotlb buffer boundaries.
> I seem to be only CC-ed on this and #7, so please bear with me.
You weren't cc'ed originally so if you get partial emails it must be 
through some list.
>
> But could you explain to me why please:
>
> commit daf9514fd5eb098d7d6f3a1247cb8cc48fc94155 (swiotlb/stable/for-linus-5.12)
> Author: Martin Radev <martin.b.radev@gmail.com>
> Date:   Tue Jan 12 16:07:29 2021 +0100
>
>      swiotlb: Validate bounce size in the sync/unmap path
>
> does not solve the problem as well?

Thanks. I missed that patch, race condition.

One major difference of my patch is that it supports an error return, 
which allows virtio to error out. This is important in virtio because 
otherwise you'll end up with uninitialized memory on the target without 
any indication. This uninitialized memory could be an potential attack 
vector on the guest memory, e.g. if the attacker finds some way to echo 
it out again.

But the error return could be added to your infrastructure too and what 
would make this patch much shorter. I'll take a look at that.

-Andi





_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 2/8] virtio: Add boundary checks to virtio ring
  2021-06-03  0:41   ` Andi Kleen
  (?)
@ 2021-06-03  2:14     ` Jason Wang
  -1 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-03  2:14 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: virtualization, hch, m.szyprowski, robin.murphy, iommu, x86,
	sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel


在 2021/6/3 上午8:41, Andi Kleen 写道:
> In protected guest mode we don't trust the host.
>
> This means we need to make sure the host cannot subvert us through
> virtio communication. In general it can corrupt our virtio data
> and cause a DOS, but it should not be able to access any data
> that is not explicitely under IO.
>
> Also boundary checking so that the free list (which is accessible
> to the host) cannot point outside the virtio ring. Note it could
> still contain loops or similar, but these should only cause an DOS,
> not a memory corruption or leak.
>
> When we detect any out of bounds descriptor trigger an IO error.
> We also use a WARN() (in case it was a software bug instead of
> an attack). This implies that a malicious host can flood
> the guest kernel log, but that's only a DOS and acceptable
> in the threat model.
>
> This patch only hardens the initial consumption of the free list,
> the freeing comes later.
>
> Any of these errors can cause DMA memory leaks, but there is nothing
> we can do about that and that would be just a DOS.
>
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>   drivers/virtio/virtio_ring.c | 46 ++++++++++++++++++++++++++++++++----
>   1 file changed, 42 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index f35629fa47b1..d37ff5a0ff58 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -413,6 +413,15 @@ static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
>   	return desc;
>   }
>   
> +/* assumes no indirect mode */
> +static inline bool inside_split_ring(struct vring_virtqueue *vq,
> +				     unsigned index)
> +{
> +	return !WARN(index >= vq->split.vring.num,
> +		    "desc index %u out of bounds (%u)\n",
> +		    index, vq->split.vring.num);


It's better to use BAD_RING to stop virtqueue in this case.


> +}
> +
>   static inline int virtqueue_add_split(struct virtqueue *_vq,
>   				      struct scatterlist *sgs[],
>   				      unsigned int total_sg,
> @@ -428,6 +437,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>   	unsigned int i, n, avail, descs_used, prev, err_idx;
>   	int head;
>   	bool indirect;
> +	int io_err;
>   
>   	START_USE(vq);
>   
> @@ -481,7 +491,13 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>   
>   	for (n = 0; n < out_sgs; n++) {
>   		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> -			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
> +			dma_addr_t addr;
> +
> +			io_err = -EIO;
> +			if (!inside_split_ring(vq, i))
> +				goto unmap_release;
> +			io_err = -ENOMEM;
> +			addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
>   			if (vring_mapping_error(vq, addr))
>   				goto unmap_release;
>   
> @@ -494,7 +510,13 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>   	}
>   	for (; n < (out_sgs + in_sgs); n++) {
>   		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> -			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
> +			dma_addr_t addr;
> +
> +			io_err = -EIO;
> +			if (!inside_split_ring(vq, i))
> +				goto unmap_release;
> +			io_err = -ENOMEM;
> +			addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
>   			if (vring_mapping_error(vq, addr))
>   				goto unmap_release;


It looks to me all the evils came from the fact that we depends on the 
descriptor ring.

So the checks in this patch could is unnecessary if we don't even read 
from the descriptor ring which could be manipulated by the device.

This is what my series tries to achieve:

https://www.spinics.net/lists/kvm/msg241825.html

Thanks



>   
> @@ -513,6 +535,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>   		dma_addr_t addr = vring_map_single(
>   			vq, desc, total_sg * sizeof(struct vring_desc),
>   			DMA_TO_DEVICE);
> +		io_err = -ENOMEM;
>   		if (vring_mapping_error(vq, addr))
>   			goto unmap_release;
>   
> @@ -528,6 +551,10 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>   	/* We're using some buffers from the free list. */
>   	vq->vq.num_free -= descs_used;
>   
> +	io_err = -EIO;
> +	if (!inside_split_ring(vq, head))
> +		goto unmap_release;
> +
>   	/* Update free pointer */
>   	if (indirect)
>   		vq->free_head = virtio16_to_cpu(_vq->vdev,
> @@ -545,6 +572,10 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>   	/* Put entry in available array (but don't update avail->idx until they
>   	 * do sync). */
>   	avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
> +
> +	if (avail >= vq->split.vring.num)
> +		goto unmap_release;
> +
>   	vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);
>   
>   	/* Descriptors and available array need to be set before we expose the
> @@ -576,6 +607,8 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>   	for (n = 0; n < total_sg; n++) {
>   		if (i == err_idx)
>   			break;
> +		if (!inside_split_ring(vq, i))
> +			break;
>   		vring_unmap_one_split(vq, &desc[i]);
>   		i = virtio16_to_cpu(_vq->vdev, desc[i].next);
>   	}
> @@ -584,7 +617,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>   		kfree(desc);
>   
>   	END_USE(vq);
> -	return -ENOMEM;
> +	return io_err;
>   }
>   
>   static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
> @@ -1146,7 +1179,12 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
>   	c = 0;
>   	for (n = 0; n < out_sgs + in_sgs; n++) {
>   		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> -			dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
> +			dma_addr_t addr;
> +
> +			if (curr >= vq->packed.vring.num)
> +				goto unmap_release;
> +
> +			addr = vring_map_one_sg(vq, sg, n < out_sgs ?
>   					DMA_TO_DEVICE : DMA_FROM_DEVICE);
>   			if (vring_mapping_error(vq, addr))
>   				goto unmap_release;


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 2/8] virtio: Add boundary checks to virtio ring
@ 2021-06-03  2:14     ` Jason Wang
  0 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-03  2:14 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: x86, linux-kernel, virtualization, iommu, jpoimboe, robin.murphy, hch


在 2021/6/3 上午8:41, Andi Kleen 写道:
> In protected guest mode we don't trust the host.
>
> This means we need to make sure the host cannot subvert us through
> virtio communication. In general it can corrupt our virtio data
> and cause a DOS, but it should not be able to access any data
> that is not explicitely under IO.
>
> Also boundary checking so that the free list (which is accessible
> to the host) cannot point outside the virtio ring. Note it could
> still contain loops or similar, but these should only cause an DOS,
> not a memory corruption or leak.
>
> When we detect any out of bounds descriptor trigger an IO error.
> We also use a WARN() (in case it was a software bug instead of
> an attack). This implies that a malicious host can flood
> the guest kernel log, but that's only a DOS and acceptable
> in the threat model.
>
> This patch only hardens the initial consumption of the free list,
> the freeing comes later.
>
> Any of these errors can cause DMA memory leaks, but there is nothing
> we can do about that and that would be just a DOS.
>
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>   drivers/virtio/virtio_ring.c | 46 ++++++++++++++++++++++++++++++++----
>   1 file changed, 42 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index f35629fa47b1..d37ff5a0ff58 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -413,6 +413,15 @@ static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
>   	return desc;
>   }
>   
> +/* assumes no indirect mode */
> +static inline bool inside_split_ring(struct vring_virtqueue *vq,
> +				     unsigned index)
> +{
> +	return !WARN(index >= vq->split.vring.num,
> +		    "desc index %u out of bounds (%u)\n",
> +		    index, vq->split.vring.num);


It's better to use BAD_RING to stop virtqueue in this case.


> +}
> +
>   static inline int virtqueue_add_split(struct virtqueue *_vq,
>   				      struct scatterlist *sgs[],
>   				      unsigned int total_sg,
> @@ -428,6 +437,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>   	unsigned int i, n, avail, descs_used, prev, err_idx;
>   	int head;
>   	bool indirect;
> +	int io_err;
>   
>   	START_USE(vq);
>   
> @@ -481,7 +491,13 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>   
>   	for (n = 0; n < out_sgs; n++) {
>   		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> -			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
> +			dma_addr_t addr;
> +
> +			io_err = -EIO;
> +			if (!inside_split_ring(vq, i))
> +				goto unmap_release;
> +			io_err = -ENOMEM;
> +			addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
>   			if (vring_mapping_error(vq, addr))
>   				goto unmap_release;
>   
> @@ -494,7 +510,13 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>   	}
>   	for (; n < (out_sgs + in_sgs); n++) {
>   		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> -			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
> +			dma_addr_t addr;
> +
> +			io_err = -EIO;
> +			if (!inside_split_ring(vq, i))
> +				goto unmap_release;
> +			io_err = -ENOMEM;
> +			addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
>   			if (vring_mapping_error(vq, addr))
>   				goto unmap_release;


It looks to me all the evils came from the fact that we depends on the 
descriptor ring.

So the checks in this patch could is unnecessary if we don't even read 
from the descriptor ring which could be manipulated by the device.

This is what my series tries to achieve:

https://www.spinics.net/lists/kvm/msg241825.html

Thanks



>   
> @@ -513,6 +535,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>   		dma_addr_t addr = vring_map_single(
>   			vq, desc, total_sg * sizeof(struct vring_desc),
>   			DMA_TO_DEVICE);
> +		io_err = -ENOMEM;
>   		if (vring_mapping_error(vq, addr))
>   			goto unmap_release;
>   
> @@ -528,6 +551,10 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>   	/* We're using some buffers from the free list. */
>   	vq->vq.num_free -= descs_used;
>   
> +	io_err = -EIO;
> +	if (!inside_split_ring(vq, head))
> +		goto unmap_release;
> +
>   	/* Update free pointer */
>   	if (indirect)
>   		vq->free_head = virtio16_to_cpu(_vq->vdev,
> @@ -545,6 +572,10 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>   	/* Put entry in available array (but don't update avail->idx until they
>   	 * do sync). */
>   	avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
> +
> +	if (avail >= vq->split.vring.num)
> +		goto unmap_release;
> +
>   	vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);
>   
>   	/* Descriptors and available array need to be set before we expose the
> @@ -576,6 +607,8 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>   	for (n = 0; n < total_sg; n++) {
>   		if (i == err_idx)
>   			break;
> +		if (!inside_split_ring(vq, i))
> +			break;
>   		vring_unmap_one_split(vq, &desc[i]);
>   		i = virtio16_to_cpu(_vq->vdev, desc[i].next);
>   	}
> @@ -584,7 +617,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>   		kfree(desc);
>   
>   	END_USE(vq);
> -	return -ENOMEM;
> +	return io_err;
>   }
>   
>   static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
> @@ -1146,7 +1179,12 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
>   	c = 0;
>   	for (n = 0; n < out_sgs + in_sgs; n++) {
>   		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> -			dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
> +			dma_addr_t addr;
> +
> +			if (curr >= vq->packed.vring.num)
> +				goto unmap_release;
> +
> +			addr = vring_map_one_sg(vq, sg, n < out_sgs ?
>   					DMA_TO_DEVICE : DMA_FROM_DEVICE);
>   			if (vring_mapping_error(vq, addr))
>   				goto unmap_release;

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 2/8] virtio: Add boundary checks to virtio ring
@ 2021-06-03  2:14     ` Jason Wang
  0 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-03  2:14 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: sathyanarayanan.kuppuswamy, x86, linux-kernel, virtualization,
	iommu, jpoimboe, robin.murphy, hch, m.szyprowski


在 2021/6/3 上午8:41, Andi Kleen 写道:
> In protected guest mode we don't trust the host.
>
> This means we need to make sure the host cannot subvert us through
> virtio communication. In general it can corrupt our virtio data
> and cause a DOS, but it should not be able to access any data
> that is not explicitely under IO.
>
> Also boundary checking so that the free list (which is accessible
> to the host) cannot point outside the virtio ring. Note it could
> still contain loops or similar, but these should only cause an DOS,
> not a memory corruption or leak.
>
> When we detect any out of bounds descriptor trigger an IO error.
> We also use a WARN() (in case it was a software bug instead of
> an attack). This implies that a malicious host can flood
> the guest kernel log, but that's only a DOS and acceptable
> in the threat model.
>
> This patch only hardens the initial consumption of the free list,
> the freeing comes later.
>
> Any of these errors can cause DMA memory leaks, but there is nothing
> we can do about that and that would be just a DOS.
>
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>   drivers/virtio/virtio_ring.c | 46 ++++++++++++++++++++++++++++++++----
>   1 file changed, 42 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index f35629fa47b1..d37ff5a0ff58 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -413,6 +413,15 @@ static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
>   	return desc;
>   }
>   
> +/* assumes no indirect mode */
> +static inline bool inside_split_ring(struct vring_virtqueue *vq,
> +				     unsigned index)
> +{
> +	return !WARN(index >= vq->split.vring.num,
> +		    "desc index %u out of bounds (%u)\n",
> +		    index, vq->split.vring.num);


It's better to use BAD_RING to stop virtqueue in this case.


> +}
> +
>   static inline int virtqueue_add_split(struct virtqueue *_vq,
>   				      struct scatterlist *sgs[],
>   				      unsigned int total_sg,
> @@ -428,6 +437,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>   	unsigned int i, n, avail, descs_used, prev, err_idx;
>   	int head;
>   	bool indirect;
> +	int io_err;
>   
>   	START_USE(vq);
>   
> @@ -481,7 +491,13 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>   
>   	for (n = 0; n < out_sgs; n++) {
>   		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> -			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
> +			dma_addr_t addr;
> +
> +			io_err = -EIO;
> +			if (!inside_split_ring(vq, i))
> +				goto unmap_release;
> +			io_err = -ENOMEM;
> +			addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
>   			if (vring_mapping_error(vq, addr))
>   				goto unmap_release;
>   
> @@ -494,7 +510,13 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>   	}
>   	for (; n < (out_sgs + in_sgs); n++) {
>   		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> -			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
> +			dma_addr_t addr;
> +
> +			io_err = -EIO;
> +			if (!inside_split_ring(vq, i))
> +				goto unmap_release;
> +			io_err = -ENOMEM;
> +			addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
>   			if (vring_mapping_error(vq, addr))
>   				goto unmap_release;


It looks to me all the evils came from the fact that we depends on the 
descriptor ring.

So the checks in this patch could is unnecessary if we don't even read 
from the descriptor ring which could be manipulated by the device.

This is what my series tries to achieve:

https://www.spinics.net/lists/kvm/msg241825.html

Thanks



>   
> @@ -513,6 +535,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>   		dma_addr_t addr = vring_map_single(
>   			vq, desc, total_sg * sizeof(struct vring_desc),
>   			DMA_TO_DEVICE);
> +		io_err = -ENOMEM;
>   		if (vring_mapping_error(vq, addr))
>   			goto unmap_release;
>   
> @@ -528,6 +551,10 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>   	/* We're using some buffers from the free list. */
>   	vq->vq.num_free -= descs_used;
>   
> +	io_err = -EIO;
> +	if (!inside_split_ring(vq, head))
> +		goto unmap_release;
> +
>   	/* Update free pointer */
>   	if (indirect)
>   		vq->free_head = virtio16_to_cpu(_vq->vdev,
> @@ -545,6 +572,10 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>   	/* Put entry in available array (but don't update avail->idx until they
>   	 * do sync). */
>   	avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
> +
> +	if (avail >= vq->split.vring.num)
> +		goto unmap_release;
> +
>   	vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);
>   
>   	/* Descriptors and available array need to be set before we expose the
> @@ -576,6 +607,8 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>   	for (n = 0; n < total_sg; n++) {
>   		if (i == err_idx)
>   			break;
> +		if (!inside_split_ring(vq, i))
> +			break;
>   		vring_unmap_one_split(vq, &desc[i]);
>   		i = virtio16_to_cpu(_vq->vdev, desc[i].next);
>   	}
> @@ -584,7 +617,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>   		kfree(desc);
>   
>   	END_USE(vq);
> -	return -ENOMEM;
> +	return io_err;
>   }
>   
>   static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
> @@ -1146,7 +1179,12 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
>   	c = 0;
>   	for (n = 0; n < out_sgs + in_sgs; n++) {
>   		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> -			dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
> +			dma_addr_t addr;
> +
> +			if (curr >= vq->packed.vring.num)
> +				goto unmap_release;
> +
> +			addr = vring_map_one_sg(vq, sg, n < out_sgs ?
>   					DMA_TO_DEVICE : DMA_FROM_DEVICE);
>   			if (vring_mapping_error(vq, addr))
>   				goto unmap_release;

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 2/8] virtio: Add boundary checks to virtio ring
  2021-06-03  2:14     ` Jason Wang
  (?)
@ 2021-06-03  2:18       ` Andi Kleen
  -1 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  2:18 UTC (permalink / raw)
  To: Jason Wang, mst
  Cc: virtualization, hch, m.szyprowski, robin.murphy, iommu, x86,
	sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel


> It looks to me all the evils came from the fact that we depends on the 
> descriptor ring.
>
> So the checks in this patch could is unnecessary if we don't even read 
> from the descriptor ring which could be manipulated by the device.
>
> This is what my series tries to achieve:
>
> https://www.spinics.net/lists/kvm/msg241825.html

I would argue that you should boundary check in any case. It was always 
a bug to not have boundary checks in such a data structure with multiple 
users, trust or not.

But yes your patch series is interesting and definitely makes sense for 
TDX too.

Best would be to have both I guess, and always check the boundaries 
everywhere.

So what's the merge status of your series?

-Andi



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 2/8] virtio: Add boundary checks to virtio ring
@ 2021-06-03  2:18       ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  2:18 UTC (permalink / raw)
  To: Jason Wang, mst
  Cc: x86, linux-kernel, virtualization, iommu, jpoimboe, robin.murphy, hch


> It looks to me all the evils came from the fact that we depends on the 
> descriptor ring.
>
> So the checks in this patch could is unnecessary if we don't even read 
> from the descriptor ring which could be manipulated by the device.
>
> This is what my series tries to achieve:
>
> https://www.spinics.net/lists/kvm/msg241825.html

I would argue that you should boundary check in any case. It was always 
a bug to not have boundary checks in such a data structure with multiple 
users, trust or not.

But yes your patch series is interesting and definitely makes sense for 
TDX too.

Best would be to have both I guess, and always check the boundaries 
everywhere.

So what's the merge status of your series?

-Andi


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 2/8] virtio: Add boundary checks to virtio ring
@ 2021-06-03  2:18       ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  2:18 UTC (permalink / raw)
  To: Jason Wang, mst
  Cc: sathyanarayanan.kuppuswamy, x86, linux-kernel, virtualization,
	iommu, jpoimboe, robin.murphy, hch, m.szyprowski


> It looks to me all the evils came from the fact that we depends on the 
> descriptor ring.
>
> So the checks in this patch could is unnecessary if we don't even read 
> from the descriptor ring which could be manipulated by the device.
>
> This is what my series tries to achieve:
>
> https://www.spinics.net/lists/kvm/msg241825.html

I would argue that you should boundary check in any case. It was always 
a bug to not have boundary checks in such a data structure with multiple 
users, trust or not.

But yes your patch series is interesting and definitely makes sense for 
TDX too.

Best would be to have both I guess, and always check the boundaries 
everywhere.

So what's the merge status of your series?

-Andi


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 3/8] virtio: Harden split buffer detachment
  2021-06-03  0:41   ` Andi Kleen
  (?)
@ 2021-06-03  2:29     ` Jason Wang
  -1 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-03  2:29 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: virtualization, hch, m.szyprowski, robin.murphy, iommu, x86,
	sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel


在 2021/6/3 上午8:41, Andi Kleen 写道:
> Harden the split buffer detachment path by adding boundary checking. Note
> that when this fails we may fail to unmap some swiotlb mapping, which could
> result in a leak and a DOS. But that's acceptable because an malicious host
> can DOS us anyways.
>
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>   drivers/virtio/virtio_ring.c | 25 +++++++++++++++++++++----
>   1 file changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index d37ff5a0ff58..1e9aa1e95e1b 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -651,12 +651,19 @@ static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
>   	return needs_kick;
>   }
>   
> -static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> -			     void **ctx)
> +static int detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> +			    void **ctx)
>   {
>   	unsigned int i, j;
>   	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
>   
> +	/* We'll leak DMA mappings when this happens, but nothing
> +	 * can be done about that. In the worst case the host
> +	 * could DOS us, but it can of course do that anyways.
> +	 */
> +	if (!inside_split_ring(vq, head))
> +		return -EIO;


I think the caller have already did this for us with even more check on 
the token (virtqueue_get_buf_ctx_split()):

         if (unlikely(i >= vq->split.vring.num)) {
                 BAD_RING(vq, "id %u out of range\n", i);
                 return NULL;
         }
         if (unlikely(!vq->split.desc_state[i].data)) {
                 BAD_RING(vq, "id %u is not a head!\n", i);
                 return NULL;
         }


> +
>   	/* Clear data ptr. */
>   	vq->split.desc_state[head].data = NULL;
>   
> @@ -666,6 +673,8 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>   	while (vq->split.vring.desc[i].flags & nextflag) {
>   		vring_unmap_one_split(vq, &vq->split.vring.desc[i]);
>   		i = virtio16_to_cpu(vq->vq.vdev, vq->split.vring.desc[i].next);
> +		if (!inside_split_ring(vq, i))
> +			return -EIO;


Similarly, if we don't depend on the metadata stored in the descriptor, 
we don't need this check.


>   		vq->vq.num_free++;
>   	}
>   
> @@ -684,7 +693,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>   
>   		/* Free the indirect table, if any, now that it's unmapped. */
>   		if (!indir_desc)
> -			return;
> +			return 0;
>   
>   		len = virtio32_to_cpu(vq->vq.vdev,
>   				vq->split.vring.desc[head].len);
> @@ -701,6 +710,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>   	} else if (ctx) {
>   		*ctx = vq->split.desc_state[head].indir_desc;
>   	}
> +	return 0;
>   }
>   
>   static inline bool more_used_split(const struct vring_virtqueue *vq)
> @@ -717,6 +727,7 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
>   	void *ret;
>   	unsigned int i;
>   	u16 last_used;
> +	int err;
>   
>   	START_USE(vq);
>   
> @@ -751,7 +762,12 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
>   
>   	/* detach_buf_split clears data, so grab it now. */
>   	ret = vq->split.desc_state[i].data;
> -	detach_buf_split(vq, i, ctx);
> +	err = detach_buf_split(vq, i, ctx);
> +	if (err) {
> +		END_USE(vq);


This reminds me that we don't use END_USE() after BAD_RING() which 
should be fixed.

Thanks


> +		return NULL;
> +	}
> +
>   	vq->last_used_idx++;
>   	/* If we expect an interrupt for the next entry, tell host
>   	 * by writing event index and flush out the write before
> @@ -863,6 +879,7 @@ static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
>   		/* detach_buf_split clears data, so grab it now. */
>   		buf = vq->split.desc_state[i].data;
>   		detach_buf_split(vq, i, NULL);
> +		/* Don't need to check for error because nothing is returned */
>   		vq->split.avail_idx_shadow--;
>   		vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
>   				vq->split.avail_idx_shadow);


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 3/8] virtio: Harden split buffer detachment
@ 2021-06-03  2:29     ` Jason Wang
  0 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-03  2:29 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: x86, linux-kernel, virtualization, iommu, jpoimboe, robin.murphy, hch


在 2021/6/3 上午8:41, Andi Kleen 写道:
> Harden the split buffer detachment path by adding boundary checking. Note
> that when this fails we may fail to unmap some swiotlb mapping, which could
> result in a leak and a DOS. But that's acceptable because an malicious host
> can DOS us anyways.
>
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>   drivers/virtio/virtio_ring.c | 25 +++++++++++++++++++++----
>   1 file changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index d37ff5a0ff58..1e9aa1e95e1b 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -651,12 +651,19 @@ static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
>   	return needs_kick;
>   }
>   
> -static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> -			     void **ctx)
> +static int detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> +			    void **ctx)
>   {
>   	unsigned int i, j;
>   	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
>   
> +	/* We'll leak DMA mappings when this happens, but nothing
> +	 * can be done about that. In the worst case the host
> +	 * could DOS us, but it can of course do that anyways.
> +	 */
> +	if (!inside_split_ring(vq, head))
> +		return -EIO;


I think the caller have already did this for us with even more check on 
the token (virtqueue_get_buf_ctx_split()):

         if (unlikely(i >= vq->split.vring.num)) {
                 BAD_RING(vq, "id %u out of range\n", i);
                 return NULL;
         }
         if (unlikely(!vq->split.desc_state[i].data)) {
                 BAD_RING(vq, "id %u is not a head!\n", i);
                 return NULL;
         }


> +
>   	/* Clear data ptr. */
>   	vq->split.desc_state[head].data = NULL;
>   
> @@ -666,6 +673,8 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>   	while (vq->split.vring.desc[i].flags & nextflag) {
>   		vring_unmap_one_split(vq, &vq->split.vring.desc[i]);
>   		i = virtio16_to_cpu(vq->vq.vdev, vq->split.vring.desc[i].next);
> +		if (!inside_split_ring(vq, i))
> +			return -EIO;


Similarly, if we don't depend on the metadata stored in the descriptor, 
we don't need this check.


>   		vq->vq.num_free++;
>   	}
>   
> @@ -684,7 +693,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>   
>   		/* Free the indirect table, if any, now that it's unmapped. */
>   		if (!indir_desc)
> -			return;
> +			return 0;
>   
>   		len = virtio32_to_cpu(vq->vq.vdev,
>   				vq->split.vring.desc[head].len);
> @@ -701,6 +710,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>   	} else if (ctx) {
>   		*ctx = vq->split.desc_state[head].indir_desc;
>   	}
> +	return 0;
>   }
>   
>   static inline bool more_used_split(const struct vring_virtqueue *vq)
> @@ -717,6 +727,7 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
>   	void *ret;
>   	unsigned int i;
>   	u16 last_used;
> +	int err;
>   
>   	START_USE(vq);
>   
> @@ -751,7 +762,12 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
>   
>   	/* detach_buf_split clears data, so grab it now. */
>   	ret = vq->split.desc_state[i].data;
> -	detach_buf_split(vq, i, ctx);
> +	err = detach_buf_split(vq, i, ctx);
> +	if (err) {
> +		END_USE(vq);


This reminds me that we don't use END_USE() after BAD_RING() which 
should be fixed.

Thanks


> +		return NULL;
> +	}
> +
>   	vq->last_used_idx++;
>   	/* If we expect an interrupt for the next entry, tell host
>   	 * by writing event index and flush out the write before
> @@ -863,6 +879,7 @@ static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
>   		/* detach_buf_split clears data, so grab it now. */
>   		buf = vq->split.desc_state[i].data;
>   		detach_buf_split(vq, i, NULL);
> +		/* Don't need to check for error because nothing is returned */
>   		vq->split.avail_idx_shadow--;
>   		vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
>   				vq->split.avail_idx_shadow);

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 3/8] virtio: Harden split buffer detachment
@ 2021-06-03  2:29     ` Jason Wang
  0 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-03  2:29 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: sathyanarayanan.kuppuswamy, x86, linux-kernel, virtualization,
	iommu, jpoimboe, robin.murphy, hch, m.szyprowski


在 2021/6/3 上午8:41, Andi Kleen 写道:
> Harden the split buffer detachment path by adding boundary checking. Note
> that when this fails we may fail to unmap some swiotlb mapping, which could
> result in a leak and a DOS. But that's acceptable because an malicious host
> can DOS us anyways.
>
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>   drivers/virtio/virtio_ring.c | 25 +++++++++++++++++++++----
>   1 file changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index d37ff5a0ff58..1e9aa1e95e1b 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -651,12 +651,19 @@ static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
>   	return needs_kick;
>   }
>   
> -static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> -			     void **ctx)
> +static int detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> +			    void **ctx)
>   {
>   	unsigned int i, j;
>   	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
>   
> +	/* We'll leak DMA mappings when this happens, but nothing
> +	 * can be done about that. In the worst case the host
> +	 * could DOS us, but it can of course do that anyways.
> +	 */
> +	if (!inside_split_ring(vq, head))
> +		return -EIO;


I think the caller have already did this for us with even more check on 
the token (virtqueue_get_buf_ctx_split()):

         if (unlikely(i >= vq->split.vring.num)) {
                 BAD_RING(vq, "id %u out of range\n", i);
                 return NULL;
         }
         if (unlikely(!vq->split.desc_state[i].data)) {
                 BAD_RING(vq, "id %u is not a head!\n", i);
                 return NULL;
         }


> +
>   	/* Clear data ptr. */
>   	vq->split.desc_state[head].data = NULL;
>   
> @@ -666,6 +673,8 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>   	while (vq->split.vring.desc[i].flags & nextflag) {
>   		vring_unmap_one_split(vq, &vq->split.vring.desc[i]);
>   		i = virtio16_to_cpu(vq->vq.vdev, vq->split.vring.desc[i].next);
> +		if (!inside_split_ring(vq, i))
> +			return -EIO;


Similarly, if we don't depend on the metadata stored in the descriptor, 
we don't need this check.


>   		vq->vq.num_free++;
>   	}
>   
> @@ -684,7 +693,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>   
>   		/* Free the indirect table, if any, now that it's unmapped. */
>   		if (!indir_desc)
> -			return;
> +			return 0;
>   
>   		len = virtio32_to_cpu(vq->vq.vdev,
>   				vq->split.vring.desc[head].len);
> @@ -701,6 +710,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>   	} else if (ctx) {
>   		*ctx = vq->split.desc_state[head].indir_desc;
>   	}
> +	return 0;
>   }
>   
>   static inline bool more_used_split(const struct vring_virtqueue *vq)
> @@ -717,6 +727,7 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
>   	void *ret;
>   	unsigned int i;
>   	u16 last_used;
> +	int err;
>   
>   	START_USE(vq);
>   
> @@ -751,7 +762,12 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
>   
>   	/* detach_buf_split clears data, so grab it now. */
>   	ret = vq->split.desc_state[i].data;
> -	detach_buf_split(vq, i, ctx);
> +	err = detach_buf_split(vq, i, ctx);
> +	if (err) {
> +		END_USE(vq);


This reminds me that we don't use END_USE() after BAD_RING() which 
should be fixed.

Thanks


> +		return NULL;
> +	}
> +
>   	vq->last_used_idx++;
>   	/* If we expect an interrupt for the next entry, tell host
>   	 * by writing event index and flush out the write before
> @@ -863,6 +879,7 @@ static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
>   		/* detach_buf_split clears data, so grab it now. */
>   		buf = vq->split.desc_state[i].data;
>   		detach_buf_split(vq, i, NULL);
> +		/* Don't need to check for error because nothing is returned */
>   		vq->split.avail_idx_shadow--;
>   		vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
>   				vq->split.avail_idx_shadow);

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
  2021-06-03  1:48       ` Andi Kleen
  (?)
@ 2021-06-03  2:32         ` Jason Wang
  -1 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-03  2:32 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: virtualization, hch, m.szyprowski, robin.murphy, iommu, x86,
	sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel


在 2021/6/3 上午9:48, Andi Kleen 写道:
>
>> So we will see huge performance regression without indirect 
>> descriptor. We need to consider to address this.
>
> A regression would be when some existing case would be slower.
>
> That's not the case because the behavior for the existing cases does 
> not change.
>
> Anyways when there are performance problems they can be addressed, but 
> first is to make it secure.


I agree, but I want to know why indirect descriptor needs to be 
disabled. The table can't be wrote by the device since it's not coherent 
swiotlb mapping.

Thanks


>
> -Andi
>
>
>>
>> Thanks
>>
>>
>>>               break;
>>>           case VIRTIO_RING_F_EVENT_IDX:
>>>               break;
>>> @@ -2231,9 +2240,12 @@ void vring_transport_features(struct 
>>> virtio_device *vdev)
>>>           case VIRTIO_F_ACCESS_PLATFORM:
>>>               break;
>>>           case VIRTIO_F_RING_PACKED:
>>> +            if (protected_guest_has(VM_MEM_ENCRYPT))
>>> +                goto clear;
>>>               break;
>>>           case VIRTIO_F_ORDER_PLATFORM:
>>>               break;
>>> +        clear:
>>>           default:
>>>               /* We don't understand this bit. */
>>>               __virtio_clear_bit(vdev, i);
>>
>


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-03  2:32         ` Jason Wang
  0 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-03  2:32 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: x86, linux-kernel, virtualization, iommu, jpoimboe, robin.murphy, hch


在 2021/6/3 上午9:48, Andi Kleen 写道:
>
>> So we will see huge performance regression without indirect 
>> descriptor. We need to consider to address this.
>
> A regression would be when some existing case would be slower.
>
> That's not the case because the behavior for the existing cases does 
> not change.
>
> Anyways when there are performance problems they can be addressed, but 
> first is to make it secure.


I agree, but I want to know why indirect descriptor needs to be 
disabled. The table can't be wrote by the device since it's not coherent 
swiotlb mapping.

Thanks


>
> -Andi
>
>
>>
>> Thanks
>>
>>
>>>               break;
>>>           case VIRTIO_RING_F_EVENT_IDX:
>>>               break;
>>> @@ -2231,9 +2240,12 @@ void vring_transport_features(struct 
>>> virtio_device *vdev)
>>>           case VIRTIO_F_ACCESS_PLATFORM:
>>>               break;
>>>           case VIRTIO_F_RING_PACKED:
>>> +            if (protected_guest_has(VM_MEM_ENCRYPT))
>>> +                goto clear;
>>>               break;
>>>           case VIRTIO_F_ORDER_PLATFORM:
>>>               break;
>>> +        clear:
>>>           default:
>>>               /* We don't understand this bit. */
>>>               __virtio_clear_bit(vdev, i);
>>
>

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-03  2:32         ` Jason Wang
  0 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-03  2:32 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: sathyanarayanan.kuppuswamy, x86, linux-kernel, virtualization,
	iommu, jpoimboe, robin.murphy, hch, m.szyprowski


在 2021/6/3 上午9:48, Andi Kleen 写道:
>
>> So we will see huge performance regression without indirect 
>> descriptor. We need to consider to address this.
>
> A regression would be when some existing case would be slower.
>
> That's not the case because the behavior for the existing cases does 
> not change.
>
> Anyways when there are performance problems they can be addressed, but 
> first is to make it secure.


I agree, but I want to know why indirect descriptor needs to be 
disabled. The table can't be wrote by the device since it's not coherent 
swiotlb mapping.

Thanks


>
> -Andi
>
>
>>
>> Thanks
>>
>>
>>>               break;
>>>           case VIRTIO_RING_F_EVENT_IDX:
>>>               break;
>>> @@ -2231,9 +2240,12 @@ void vring_transport_features(struct 
>>> virtio_device *vdev)
>>>           case VIRTIO_F_ACCESS_PLATFORM:
>>>               break;
>>>           case VIRTIO_F_RING_PACKED:
>>> +            if (protected_guest_has(VM_MEM_ENCRYPT))
>>> +                goto clear;
>>>               break;
>>>           case VIRTIO_F_ORDER_PLATFORM:
>>>               break;
>>> +        clear:
>>>           default:
>>>               /* We don't understand this bit. */
>>>               __virtio_clear_bit(vdev, i);
>>
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 2/8] virtio: Add boundary checks to virtio ring
  2021-06-03  2:18       ` Andi Kleen
  (?)
@ 2021-06-03  2:36         ` Jason Wang
  -1 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-03  2:36 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: virtualization, hch, m.szyprowski, robin.murphy, iommu, x86,
	sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel


在 2021/6/3 上午10:18, Andi Kleen 写道:
>
>> It looks to me all the evils came from the fact that we depends on 
>> the descriptor ring.
>>
>> So the checks in this patch could is unnecessary if we don't even 
>> read from the descriptor ring which could be manipulated by the device.
>>
>> This is what my series tries to achieve:
>>
>> https://www.spinics.net/lists/kvm/msg241825.html
>
> I would argue that you should boundary check in any case. It was 
> always a bug to not have boundary checks in such a data structure with 
> multiple users, trust or not.
>
> But yes your patch series is interesting and definitely makes sense 
> for TDX too.
>
> Best would be to have both I guess, and always check the boundaries 
> everywhere.


I agree but some of the checks are unnecessary in we do this series on 
top of my series.


>
> So what's the merge status of your series?


If I understand correctly from Michael, I will send a formal series and 
he will try to merge it for the 5.14.

Thanks


>
> -Andi
>
>


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 2/8] virtio: Add boundary checks to virtio ring
@ 2021-06-03  2:36         ` Jason Wang
  0 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-03  2:36 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: x86, linux-kernel, virtualization, iommu, jpoimboe, robin.murphy, hch


在 2021/6/3 上午10:18, Andi Kleen 写道:
>
>> It looks to me all the evils came from the fact that we depends on 
>> the descriptor ring.
>>
>> So the checks in this patch could is unnecessary if we don't even 
>> read from the descriptor ring which could be manipulated by the device.
>>
>> This is what my series tries to achieve:
>>
>> https://www.spinics.net/lists/kvm/msg241825.html
>
> I would argue that you should boundary check in any case. It was 
> always a bug to not have boundary checks in such a data structure with 
> multiple users, trust or not.
>
> But yes your patch series is interesting and definitely makes sense 
> for TDX too.
>
> Best would be to have both I guess, and always check the boundaries 
> everywhere.


I agree but some of the checks are unnecessary in we do this series on 
top of my series.


>
> So what's the merge status of your series?


If I understand correctly from Michael, I will send a formal series and 
he will try to merge it for the 5.14.

Thanks


>
> -Andi
>
>

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 2/8] virtio: Add boundary checks to virtio ring
@ 2021-06-03  2:36         ` Jason Wang
  0 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-03  2:36 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: sathyanarayanan.kuppuswamy, x86, linux-kernel, virtualization,
	iommu, jpoimboe, robin.murphy, hch, m.szyprowski


在 2021/6/3 上午10:18, Andi Kleen 写道:
>
>> It looks to me all the evils came from the fact that we depends on 
>> the descriptor ring.
>>
>> So the checks in this patch could is unnecessary if we don't even 
>> read from the descriptor ring which could be manipulated by the device.
>>
>> This is what my series tries to achieve:
>>
>> https://www.spinics.net/lists/kvm/msg241825.html
>
> I would argue that you should boundary check in any case. It was 
> always a bug to not have boundary checks in such a data structure with 
> multiple users, trust or not.
>
> But yes your patch series is interesting and definitely makes sense 
> for TDX too.
>
> Best would be to have both I guess, and always check the boundaries 
> everywhere.


I agree but some of the checks are unnecessary in we do this series on 
top of my series.


>
> So what's the merge status of your series?


If I understand correctly from Michael, I will send a formal series and 
he will try to merge it for the 5.14.

Thanks


>
> -Andi
>
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
  2021-06-03  2:32         ` Jason Wang
  (?)
@ 2021-06-03  2:56           ` Andi Kleen
  -1 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  2:56 UTC (permalink / raw)
  To: Jason Wang, mst
  Cc: virtualization, hch, m.szyprowski, robin.murphy, iommu, x86,
	sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel


>
> I agree, but I want to know why indirect descriptor needs to be 
> disabled. The table can't be wrote by the device since it's not 
> coherent swiotlb mapping.

I had all kinds of problems with uninitialized entries in the indirect 
table. So I gave up on it and concluded it would be too difficult to secure.


-Andi



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-03  2:56           ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  2:56 UTC (permalink / raw)
  To: Jason Wang, mst
  Cc: x86, linux-kernel, virtualization, iommu, jpoimboe, robin.murphy, hch


>
> I agree, but I want to know why indirect descriptor needs to be 
> disabled. The table can't be wrote by the device since it's not 
> coherent swiotlb mapping.

I had all kinds of problems with uninitialized entries in the indirect 
table. So I gave up on it and concluded it would be too difficult to secure.


-Andi


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-03  2:56           ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03  2:56 UTC (permalink / raw)
  To: Jason Wang, mst
  Cc: sathyanarayanan.kuppuswamy, x86, linux-kernel, virtualization,
	iommu, jpoimboe, robin.murphy, hch, m.szyprowski


>
> I agree, but I want to know why indirect descriptor needs to be 
> disabled. The table can't be wrote by the device since it's not 
> coherent swiotlb mapping.

I had all kinds of problems with uninitialized entries in the indirect 
table. So I gave up on it and concluded it would be too difficult to secure.


-Andi


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
  2021-06-03  2:56           ` Andi Kleen
  (?)
@ 2021-06-03  3:02             ` Jason Wang
  -1 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-03  3:02 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: virtualization, hch, m.szyprowski, robin.murphy, iommu, x86,
	sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel


在 2021/6/3 上午10:56, Andi Kleen 写道:
>
>>
>> I agree, but I want to know why indirect descriptor needs to be 
>> disabled. The table can't be wrote by the device since it's not 
>> coherent swiotlb mapping.
>
> I had all kinds of problems with uninitialized entries in the indirect 
> table. So I gave up on it and concluded it would be too difficult to 
> secure.
>
>
> -Andi
>
>

Ok, but what I meant is this, if we don't read from the descriptor ring, 
and validate all the other metadata supplied by the device (used id and 
len). Then there should be no way for the device to suppress the dma 
flags to write to the indirect descriptor table.

Or do you have an example how it can do that?

Thanks


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-03  3:02             ` Jason Wang
  0 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-03  3:02 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: x86, linux-kernel, virtualization, iommu, jpoimboe, robin.murphy, hch


在 2021/6/3 上午10:56, Andi Kleen 写道:
>
>>
>> I agree, but I want to know why indirect descriptor needs to be 
>> disabled. The table can't be wrote by the device since it's not 
>> coherent swiotlb mapping.
>
> I had all kinds of problems with uninitialized entries in the indirect 
> table. So I gave up on it and concluded it would be too difficult to 
> secure.
>
>
> -Andi
>
>

Ok, but what I meant is this, if we don't read from the descriptor ring, 
and validate all the other metadata supplied by the device (used id and 
len). Then there should be no way for the device to suppress the dma 
flags to write to the indirect descriptor table.

Or do you have an example how it can do that?

Thanks

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-03  3:02             ` Jason Wang
  0 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-03  3:02 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: sathyanarayanan.kuppuswamy, x86, linux-kernel, virtualization,
	iommu, jpoimboe, robin.murphy, hch, m.szyprowski


在 2021/6/3 上午10:56, Andi Kleen 写道:
>
>>
>> I agree, but I want to know why indirect descriptor needs to be 
>> disabled. The table can't be wrote by the device since it's not 
>> coherent swiotlb mapping.
>
> I had all kinds of problems with uninitialized entries in the indirect 
> table. So I gave up on it and concluded it would be too difficult to 
> secure.
>
>
> -Andi
>
>

Ok, but what I meant is this, if we don't read from the descriptor ring, 
and validate all the other metadata supplied by the device (used id and 
len). Then there should be no way for the device to suppress the dma 
flags to write to the indirect descriptor table.

Or do you have an example how it can do that?

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 4/8] x86/tdx: Add arch_has_restricted_memory_access for TDX
  2021-06-03  0:41   ` Andi Kleen
@ 2021-06-03  4:02     ` Kuppuswamy, Sathyanarayanan
  -1 siblings, 0 replies; 116+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-06-03  4:02 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: jasowang, virtualization, hch, m.szyprowski, robin.murphy, iommu,
	x86, jpoimboe, linux-kernel



On 6/2/21 5:41 PM, Andi Kleen wrote:
> +int arch_has_restricted_virtio_memory_access(void)
> +{
> +	return is_tdx_guest();
> +}
> +EXPORT_SYMBOL_GPL(arch_has_restricted_virtio_memory_access);
> +

This function definition had to be removed from arch/x86/mm/mem_encrypt.c.

Otherwise, if you enable both CONFIG_AMD_MEM_ENCRYPT,
CONFIG_X86_MEM_ENCRYPT_COMMON it will generate multiple definition error.

--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -493,9 +493,3 @@ void __init amd_mem_encrypt_init(void)

         print_mem_encrypt_feature_info();
  }
-
-int arch_has_restricted_virtio_memory_access(void)
-{
-       return sev_active();
-}
-EXPORT_SYMBOL_GPL(arch_has_restricted_virtio_memory_access);

--- a/arch/x86/mm/mem_encrypt_common.c
+++ b/arch/x86/mm/mem_encrypt_common.c
@@ -40,7 +40,7 @@ void __init mem_encrypt_init(void)

  int arch_has_restricted_virtio_memory_access(void)
  {
-       return is_tdx_guest();
+       return (is_tdx_guest() || sev_active());
  }
  EXPORT_SYMBOL_GPL(arch_has_restricted_virtio_memory_access);


-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 4/8] x86/tdx: Add arch_has_restricted_memory_access for TDX
@ 2021-06-03  4:02     ` Kuppuswamy, Sathyanarayanan
  0 siblings, 0 replies; 116+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-06-03  4:02 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: jasowang, x86, linux-kernel, virtualization, iommu, jpoimboe,
	robin.murphy, hch



On 6/2/21 5:41 PM, Andi Kleen wrote:
> +int arch_has_restricted_virtio_memory_access(void)
> +{
> +	return is_tdx_guest();
> +}
> +EXPORT_SYMBOL_GPL(arch_has_restricted_virtio_memory_access);
> +

This function definition had to be removed from arch/x86/mm/mem_encrypt.c.

Otherwise, if you enable both CONFIG_AMD_MEM_ENCRYPT,
CONFIG_X86_MEM_ENCRYPT_COMMON it will generate multiple definition error.

--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -493,9 +493,3 @@ void __init amd_mem_encrypt_init(void)

         print_mem_encrypt_feature_info();
  }
-
-int arch_has_restricted_virtio_memory_access(void)
-{
-       return sev_active();
-}
-EXPORT_SYMBOL_GPL(arch_has_restricted_virtio_memory_access);

--- a/arch/x86/mm/mem_encrypt_common.c
+++ b/arch/x86/mm/mem_encrypt_common.c
@@ -40,7 +40,7 @@ void __init mem_encrypt_init(void)

  int arch_has_restricted_virtio_memory_access(void)
  {
-       return is_tdx_guest();
+       return (is_tdx_guest() || sev_active());
  }
  EXPORT_SYMBOL_GPL(arch_has_restricted_virtio_memory_access);


-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 6/8] dma: Add return value to dma_unmap_page
  2021-06-03  0:41   ` Andi Kleen
  (?)
@ 2021-06-03  9:08     ` Robin Murphy
  -1 siblings, 0 replies; 116+ messages in thread
From: Robin Murphy @ 2021-06-03  9:08 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: jasowang, virtualization, hch, m.szyprowski, iommu, x86,
	sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel

Hi Andi,

On 2021-06-03 01:41, Andi Kleen wrote:
> In some situations when we know swiotlb is forced and we have
> to deal with untrusted hosts, it's useful to know if a mapping
> was in the swiotlb or not. This allows us to abort any IO
> operation that would access memory outside the swiotlb.
> 
> Otherwise it might be possible for a malicious host to inject
> any guest page in a read operation. While it couldn't directly
> access the results of the read() inside the guest, there
> might scenarios where data is echoed back with a write(),
> and that would then leak guest memory.
> 
> Add a return value to dma_unmap_single/page. Most users
> of course will ignore it. The return value is set to EIO
> if we're in forced swiotlb mode and the buffer is not inside
> the swiotlb buffer. Otherwise it's always 0.

I have to say my first impression of this isn't too good :(

What it looks like to me is abusing SWIOTLB's internal housekeeping to 
keep track of virtio-specific state. The DMA API does not attempt to 
validate calls in general since in many cases the additional overhead 
would be prohibitive. It has always been callers' responsibility to keep 
track of what they mapped and make sure sync/unmap calls match, and 
there are many, many, subtle and not-so-subtle ways for things to go 
wrong if they don't. If virtio is not doing a good enough job of that, 
what's the justification for making it the DMA API's problem?

> A new callback is used to avoid changing all the IOMMU drivers.

Nit: presumably by "IOMMU drivers" you actually mean arch DMA API backends?

As an aside, we'll take a look at the rest of the series for the 
perspective of our prototyping for Arm's Confidential Compute 
Architecture, but I'm not sure we'll need it, since accesses beyond the 
bounds of the shared SWIOTLB buffer shouldn't be an issue for us. 
Furthermore, AFAICS it's still not going to help against exfiltrating 
guest memory by over-unmapping the original SWIOTLB slot *without* going 
past the end of the whole buffer, but I think Martin's patch *has* 
addressed that already.

Robin.

> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>   drivers/iommu/dma-iommu.c   | 17 +++++++++++------
>   include/linux/dma-map-ops.h |  3 +++
>   include/linux/dma-mapping.h |  7 ++++---
>   kernel/dma/mapping.c        |  6 +++++-
>   4 files changed, 23 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 7ef13198721b..babe46f2ae3a 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -491,7 +491,8 @@ static void __iommu_dma_unmap(struct device *dev, dma_addr_t dma_addr,
>   	iommu_dma_free_iova(cookie, dma_addr, size, iotlb_gather.freelist);
>   }
>   
> -static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
> +static int __iommu_dma_unmap_swiotlb_check(struct device *dev,
> +		dma_addr_t dma_addr,
>   		size_t size, enum dma_data_direction dir,
>   		unsigned long attrs)
>   {
> @@ -500,12 +501,15 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
>   
>   	phys = iommu_iova_to_phys(domain, dma_addr);
>   	if (WARN_ON(!phys))
> -		return;
> +		return -EIO;
>   
>   	__iommu_dma_unmap(dev, dma_addr, size);
>   
>   	if (unlikely(is_swiotlb_buffer(phys, size)))
>   		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
> +	else if (swiotlb_force == SWIOTLB_FORCE)
> +		return -EIO;
> +	return 0;
>   }
>   
>   static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
> @@ -856,12 +860,13 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
>   	return dma_handle;
>   }
>   
> -static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
> +static int iommu_dma_unmap_page_check(struct device *dev, dma_addr_t dma_handle,
>   		size_t size, enum dma_data_direction dir, unsigned long attrs)
>   {
>   	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
>   		iommu_dma_sync_single_for_cpu(dev, dma_handle, size, dir);
> -	__iommu_dma_unmap_swiotlb(dev, dma_handle, size, dir, attrs);
> +	return __iommu_dma_unmap_swiotlb_check(dev, dma_handle, size, dir,
> +					       attrs);
>   }
>   
>   /*
> @@ -946,7 +951,7 @@ static void iommu_dma_unmap_sg_swiotlb(struct device *dev, struct scatterlist *s
>   	int i;
>   
>   	for_each_sg(sg, s, nents, i)
> -		__iommu_dma_unmap_swiotlb(dev, sg_dma_address(s),
> +		__iommu_dma_unmap_swiotlb_check(dev, sg_dma_address(s),
>   				sg_dma_len(s), dir, attrs);
>   }
>   
> @@ -1291,7 +1296,7 @@ static const struct dma_map_ops iommu_dma_ops = {
>   	.mmap			= iommu_dma_mmap,
>   	.get_sgtable		= iommu_dma_get_sgtable,
>   	.map_page		= iommu_dma_map_page,
> -	.unmap_page		= iommu_dma_unmap_page,
> +	.unmap_page_check	= iommu_dma_unmap_page_check,
>   	.map_sg			= iommu_dma_map_sg,
>   	.unmap_sg		= iommu_dma_unmap_sg,
>   	.sync_single_for_cpu	= iommu_dma_sync_single_for_cpu,
> diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
> index 0d53a96a3d64..0ed0190f7949 100644
> --- a/include/linux/dma-map-ops.h
> +++ b/include/linux/dma-map-ops.h
> @@ -69,6 +69,9 @@ struct dma_map_ops {
>   	u64 (*get_required_mask)(struct device *dev);
>   	size_t (*max_mapping_size)(struct device *dev);
>   	unsigned long (*get_merge_boundary)(struct device *dev);
> +	int (*unmap_page_check)(struct device *dev, dma_addr_t dma_handle,
> +			size_t size, enum dma_data_direction dir,
> +			unsigned long attrs);
>   };
>   
>   #ifdef CONFIG_DMA_OPS
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 37fbd12bd4ab..25b8382f8601 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -103,7 +103,7 @@ static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
>   dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
>   		size_t offset, size_t size, enum dma_data_direction dir,
>   		unsigned long attrs);
> -void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
> +int dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
>   		enum dma_data_direction dir, unsigned long attrs);
>   int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
>   		enum dma_data_direction dir, unsigned long attrs);
> @@ -160,9 +160,10 @@ static inline dma_addr_t dma_map_page_attrs(struct device *dev,
>   {
>   	return DMA_MAPPING_ERROR;
>   }
> -static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
> +static inline int dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
>   		size_t size, enum dma_data_direction dir, unsigned long attrs)
>   {
> +	return 0;
>   }
>   static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>   		int nents, enum dma_data_direction dir, unsigned long attrs)
> @@ -323,7 +324,7 @@ static inline dma_addr_t dma_map_single_attrs(struct device *dev, void *ptr,
>   			size, dir, attrs);
>   }
>   
> -static inline void dma_unmap_single_attrs(struct device *dev, dma_addr_t addr,
> +static inline int dma_unmap_single_attrs(struct device *dev, dma_addr_t addr,
>   		size_t size, enum dma_data_direction dir, unsigned long attrs)
>   {
>   	return dma_unmap_page_attrs(dev, addr, size, dir, attrs);
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 9bf02c8d7d1b..dc0ce649d1f9 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -162,18 +162,22 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
>   }
>   EXPORT_SYMBOL(dma_map_page_attrs);
>   
> -void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
> +int dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
>   		enum dma_data_direction dir, unsigned long attrs)
>   {
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
> +	int ret = 0;
>   
>   	BUG_ON(!valid_dma_direction(dir));
>   	if (dma_map_direct(dev, ops) ||
>   	    arch_dma_unmap_page_direct(dev, addr + size))
>   		dma_direct_unmap_page(dev, addr, size, dir, attrs);
> +	else if (ops->unmap_page_check)
> +		ret = ops->unmap_page_check(dev, addr, size, dir, attrs);
>   	else if (ops->unmap_page)
>   		ops->unmap_page(dev, addr, size, dir, attrs);
>   	debug_dma_unmap_page(dev, addr, size, dir);
> +	return ret;
>   }
>   EXPORT_SYMBOL(dma_unmap_page_attrs);
>   
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 6/8] dma: Add return value to dma_unmap_page
@ 2021-06-03  9:08     ` Robin Murphy
  0 siblings, 0 replies; 116+ messages in thread
From: Robin Murphy @ 2021-06-03  9:08 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: jasowang, x86, linux-kernel, virtualization, iommu, jpoimboe, hch

Hi Andi,

On 2021-06-03 01:41, Andi Kleen wrote:
> In some situations when we know swiotlb is forced and we have
> to deal with untrusted hosts, it's useful to know if a mapping
> was in the swiotlb or not. This allows us to abort any IO
> operation that would access memory outside the swiotlb.
> 
> Otherwise it might be possible for a malicious host to inject
> any guest page in a read operation. While it couldn't directly
> access the results of the read() inside the guest, there
> might scenarios where data is echoed back with a write(),
> and that would then leak guest memory.
> 
> Add a return value to dma_unmap_single/page. Most users
> of course will ignore it. The return value is set to EIO
> if we're in forced swiotlb mode and the buffer is not inside
> the swiotlb buffer. Otherwise it's always 0.

I have to say my first impression of this isn't too good :(

What it looks like to me is abusing SWIOTLB's internal housekeeping to 
keep track of virtio-specific state. The DMA API does not attempt to 
validate calls in general since in many cases the additional overhead 
would be prohibitive. It has always been callers' responsibility to keep 
track of what they mapped and make sure sync/unmap calls match, and 
there are many, many, subtle and not-so-subtle ways for things to go 
wrong if they don't. If virtio is not doing a good enough job of that, 
what's the justification for making it the DMA API's problem?

> A new callback is used to avoid changing all the IOMMU drivers.

Nit: presumably by "IOMMU drivers" you actually mean arch DMA API backends?

As an aside, we'll take a look at the rest of the series for the 
perspective of our prototyping for Arm's Confidential Compute 
Architecture, but I'm not sure we'll need it, since accesses beyond the 
bounds of the shared SWIOTLB buffer shouldn't be an issue for us. 
Furthermore, AFAICS it's still not going to help against exfiltrating 
guest memory by over-unmapping the original SWIOTLB slot *without* going 
past the end of the whole buffer, but I think Martin's patch *has* 
addressed that already.

Robin.

> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>   drivers/iommu/dma-iommu.c   | 17 +++++++++++------
>   include/linux/dma-map-ops.h |  3 +++
>   include/linux/dma-mapping.h |  7 ++++---
>   kernel/dma/mapping.c        |  6 +++++-
>   4 files changed, 23 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 7ef13198721b..babe46f2ae3a 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -491,7 +491,8 @@ static void __iommu_dma_unmap(struct device *dev, dma_addr_t dma_addr,
>   	iommu_dma_free_iova(cookie, dma_addr, size, iotlb_gather.freelist);
>   }
>   
> -static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
> +static int __iommu_dma_unmap_swiotlb_check(struct device *dev,
> +		dma_addr_t dma_addr,
>   		size_t size, enum dma_data_direction dir,
>   		unsigned long attrs)
>   {
> @@ -500,12 +501,15 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
>   
>   	phys = iommu_iova_to_phys(domain, dma_addr);
>   	if (WARN_ON(!phys))
> -		return;
> +		return -EIO;
>   
>   	__iommu_dma_unmap(dev, dma_addr, size);
>   
>   	if (unlikely(is_swiotlb_buffer(phys, size)))
>   		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
> +	else if (swiotlb_force == SWIOTLB_FORCE)
> +		return -EIO;
> +	return 0;
>   }
>   
>   static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
> @@ -856,12 +860,13 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
>   	return dma_handle;
>   }
>   
> -static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
> +static int iommu_dma_unmap_page_check(struct device *dev, dma_addr_t dma_handle,
>   		size_t size, enum dma_data_direction dir, unsigned long attrs)
>   {
>   	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
>   		iommu_dma_sync_single_for_cpu(dev, dma_handle, size, dir);
> -	__iommu_dma_unmap_swiotlb(dev, dma_handle, size, dir, attrs);
> +	return __iommu_dma_unmap_swiotlb_check(dev, dma_handle, size, dir,
> +					       attrs);
>   }
>   
>   /*
> @@ -946,7 +951,7 @@ static void iommu_dma_unmap_sg_swiotlb(struct device *dev, struct scatterlist *s
>   	int i;
>   
>   	for_each_sg(sg, s, nents, i)
> -		__iommu_dma_unmap_swiotlb(dev, sg_dma_address(s),
> +		__iommu_dma_unmap_swiotlb_check(dev, sg_dma_address(s),
>   				sg_dma_len(s), dir, attrs);
>   }
>   
> @@ -1291,7 +1296,7 @@ static const struct dma_map_ops iommu_dma_ops = {
>   	.mmap			= iommu_dma_mmap,
>   	.get_sgtable		= iommu_dma_get_sgtable,
>   	.map_page		= iommu_dma_map_page,
> -	.unmap_page		= iommu_dma_unmap_page,
> +	.unmap_page_check	= iommu_dma_unmap_page_check,
>   	.map_sg			= iommu_dma_map_sg,
>   	.unmap_sg		= iommu_dma_unmap_sg,
>   	.sync_single_for_cpu	= iommu_dma_sync_single_for_cpu,
> diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
> index 0d53a96a3d64..0ed0190f7949 100644
> --- a/include/linux/dma-map-ops.h
> +++ b/include/linux/dma-map-ops.h
> @@ -69,6 +69,9 @@ struct dma_map_ops {
>   	u64 (*get_required_mask)(struct device *dev);
>   	size_t (*max_mapping_size)(struct device *dev);
>   	unsigned long (*get_merge_boundary)(struct device *dev);
> +	int (*unmap_page_check)(struct device *dev, dma_addr_t dma_handle,
> +			size_t size, enum dma_data_direction dir,
> +			unsigned long attrs);
>   };
>   
>   #ifdef CONFIG_DMA_OPS
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 37fbd12bd4ab..25b8382f8601 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -103,7 +103,7 @@ static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
>   dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
>   		size_t offset, size_t size, enum dma_data_direction dir,
>   		unsigned long attrs);
> -void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
> +int dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
>   		enum dma_data_direction dir, unsigned long attrs);
>   int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
>   		enum dma_data_direction dir, unsigned long attrs);
> @@ -160,9 +160,10 @@ static inline dma_addr_t dma_map_page_attrs(struct device *dev,
>   {
>   	return DMA_MAPPING_ERROR;
>   }
> -static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
> +static inline int dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
>   		size_t size, enum dma_data_direction dir, unsigned long attrs)
>   {
> +	return 0;
>   }
>   static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>   		int nents, enum dma_data_direction dir, unsigned long attrs)
> @@ -323,7 +324,7 @@ static inline dma_addr_t dma_map_single_attrs(struct device *dev, void *ptr,
>   			size, dir, attrs);
>   }
>   
> -static inline void dma_unmap_single_attrs(struct device *dev, dma_addr_t addr,
> +static inline int dma_unmap_single_attrs(struct device *dev, dma_addr_t addr,
>   		size_t size, enum dma_data_direction dir, unsigned long attrs)
>   {
>   	return dma_unmap_page_attrs(dev, addr, size, dir, attrs);
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 9bf02c8d7d1b..dc0ce649d1f9 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -162,18 +162,22 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
>   }
>   EXPORT_SYMBOL(dma_map_page_attrs);
>   
> -void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
> +int dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
>   		enum dma_data_direction dir, unsigned long attrs)
>   {
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
> +	int ret = 0;
>   
>   	BUG_ON(!valid_dma_direction(dir));
>   	if (dma_map_direct(dev, ops) ||
>   	    arch_dma_unmap_page_direct(dev, addr + size))
>   		dma_direct_unmap_page(dev, addr, size, dir, attrs);
> +	else if (ops->unmap_page_check)
> +		ret = ops->unmap_page_check(dev, addr, size, dir, attrs);
>   	else if (ops->unmap_page)
>   		ops->unmap_page(dev, addr, size, dir, attrs);
>   	debug_dma_unmap_page(dev, addr, size, dir);
> +	return ret;
>   }
>   EXPORT_SYMBOL(dma_unmap_page_attrs);
>   
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 6/8] dma: Add return value to dma_unmap_page
@ 2021-06-03  9:08     ` Robin Murphy
  0 siblings, 0 replies; 116+ messages in thread
From: Robin Murphy @ 2021-06-03  9:08 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: sathyanarayanan.kuppuswamy, x86, linux-kernel, virtualization,
	iommu, jpoimboe, hch, m.szyprowski

Hi Andi,

On 2021-06-03 01:41, Andi Kleen wrote:
> In some situations when we know swiotlb is forced and we have
> to deal with untrusted hosts, it's useful to know if a mapping
> was in the swiotlb or not. This allows us to abort any IO
> operation that would access memory outside the swiotlb.
> 
> Otherwise it might be possible for a malicious host to inject
> any guest page in a read operation. While it couldn't directly
> access the results of the read() inside the guest, there
> might scenarios where data is echoed back with a write(),
> and that would then leak guest memory.
> 
> Add a return value to dma_unmap_single/page. Most users
> of course will ignore it. The return value is set to EIO
> if we're in forced swiotlb mode and the buffer is not inside
> the swiotlb buffer. Otherwise it's always 0.

I have to say my first impression of this isn't too good :(

What it looks like to me is abusing SWIOTLB's internal housekeeping to 
keep track of virtio-specific state. The DMA API does not attempt to 
validate calls in general since in many cases the additional overhead 
would be prohibitive. It has always been callers' responsibility to keep 
track of what they mapped and make sure sync/unmap calls match, and 
there are many, many, subtle and not-so-subtle ways for things to go 
wrong if they don't. If virtio is not doing a good enough job of that, 
what's the justification for making it the DMA API's problem?

> A new callback is used to avoid changing all the IOMMU drivers.

Nit: presumably by "IOMMU drivers" you actually mean arch DMA API backends?

As an aside, we'll take a look at the rest of the series for the 
perspective of our prototyping for Arm's Confidential Compute 
Architecture, but I'm not sure we'll need it, since accesses beyond the 
bounds of the shared SWIOTLB buffer shouldn't be an issue for us. 
Furthermore, AFAICS it's still not going to help against exfiltrating 
guest memory by over-unmapping the original SWIOTLB slot *without* going 
past the end of the whole buffer, but I think Martin's patch *has* 
addressed that already.

Robin.

> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>   drivers/iommu/dma-iommu.c   | 17 +++++++++++------
>   include/linux/dma-map-ops.h |  3 +++
>   include/linux/dma-mapping.h |  7 ++++---
>   kernel/dma/mapping.c        |  6 +++++-
>   4 files changed, 23 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 7ef13198721b..babe46f2ae3a 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -491,7 +491,8 @@ static void __iommu_dma_unmap(struct device *dev, dma_addr_t dma_addr,
>   	iommu_dma_free_iova(cookie, dma_addr, size, iotlb_gather.freelist);
>   }
>   
> -static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
> +static int __iommu_dma_unmap_swiotlb_check(struct device *dev,
> +		dma_addr_t dma_addr,
>   		size_t size, enum dma_data_direction dir,
>   		unsigned long attrs)
>   {
> @@ -500,12 +501,15 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
>   
>   	phys = iommu_iova_to_phys(domain, dma_addr);
>   	if (WARN_ON(!phys))
> -		return;
> +		return -EIO;
>   
>   	__iommu_dma_unmap(dev, dma_addr, size);
>   
>   	if (unlikely(is_swiotlb_buffer(phys, size)))
>   		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
> +	else if (swiotlb_force == SWIOTLB_FORCE)
> +		return -EIO;
> +	return 0;
>   }
>   
>   static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
> @@ -856,12 +860,13 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
>   	return dma_handle;
>   }
>   
> -static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
> +static int iommu_dma_unmap_page_check(struct device *dev, dma_addr_t dma_handle,
>   		size_t size, enum dma_data_direction dir, unsigned long attrs)
>   {
>   	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
>   		iommu_dma_sync_single_for_cpu(dev, dma_handle, size, dir);
> -	__iommu_dma_unmap_swiotlb(dev, dma_handle, size, dir, attrs);
> +	return __iommu_dma_unmap_swiotlb_check(dev, dma_handle, size, dir,
> +					       attrs);
>   }
>   
>   /*
> @@ -946,7 +951,7 @@ static void iommu_dma_unmap_sg_swiotlb(struct device *dev, struct scatterlist *s
>   	int i;
>   
>   	for_each_sg(sg, s, nents, i)
> -		__iommu_dma_unmap_swiotlb(dev, sg_dma_address(s),
> +		__iommu_dma_unmap_swiotlb_check(dev, sg_dma_address(s),
>   				sg_dma_len(s), dir, attrs);
>   }
>   
> @@ -1291,7 +1296,7 @@ static const struct dma_map_ops iommu_dma_ops = {
>   	.mmap			= iommu_dma_mmap,
>   	.get_sgtable		= iommu_dma_get_sgtable,
>   	.map_page		= iommu_dma_map_page,
> -	.unmap_page		= iommu_dma_unmap_page,
> +	.unmap_page_check	= iommu_dma_unmap_page_check,
>   	.map_sg			= iommu_dma_map_sg,
>   	.unmap_sg		= iommu_dma_unmap_sg,
>   	.sync_single_for_cpu	= iommu_dma_sync_single_for_cpu,
> diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
> index 0d53a96a3d64..0ed0190f7949 100644
> --- a/include/linux/dma-map-ops.h
> +++ b/include/linux/dma-map-ops.h
> @@ -69,6 +69,9 @@ struct dma_map_ops {
>   	u64 (*get_required_mask)(struct device *dev);
>   	size_t (*max_mapping_size)(struct device *dev);
>   	unsigned long (*get_merge_boundary)(struct device *dev);
> +	int (*unmap_page_check)(struct device *dev, dma_addr_t dma_handle,
> +			size_t size, enum dma_data_direction dir,
> +			unsigned long attrs);
>   };
>   
>   #ifdef CONFIG_DMA_OPS
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 37fbd12bd4ab..25b8382f8601 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -103,7 +103,7 @@ static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
>   dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
>   		size_t offset, size_t size, enum dma_data_direction dir,
>   		unsigned long attrs);
> -void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
> +int dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
>   		enum dma_data_direction dir, unsigned long attrs);
>   int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
>   		enum dma_data_direction dir, unsigned long attrs);
> @@ -160,9 +160,10 @@ static inline dma_addr_t dma_map_page_attrs(struct device *dev,
>   {
>   	return DMA_MAPPING_ERROR;
>   }
> -static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
> +static inline int dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
>   		size_t size, enum dma_data_direction dir, unsigned long attrs)
>   {
> +	return 0;
>   }
>   static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>   		int nents, enum dma_data_direction dir, unsigned long attrs)
> @@ -323,7 +324,7 @@ static inline dma_addr_t dma_map_single_attrs(struct device *dev, void *ptr,
>   			size, dir, attrs);
>   }
>   
> -static inline void dma_unmap_single_attrs(struct device *dev, dma_addr_t addr,
> +static inline int dma_unmap_single_attrs(struct device *dev, dma_addr_t addr,
>   		size_t size, enum dma_data_direction dir, unsigned long attrs)
>   {
>   	return dma_unmap_page_attrs(dev, addr, size, dir, attrs);
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 9bf02c8d7d1b..dc0ce649d1f9 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -162,18 +162,22 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
>   }
>   EXPORT_SYMBOL(dma_map_page_attrs);
>   
> -void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
> +int dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
>   		enum dma_data_direction dir, unsigned long attrs)
>   {
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
> +	int ret = 0;
>   
>   	BUG_ON(!valid_dma_direction(dir));
>   	if (dma_map_direct(dev, ops) ||
>   	    arch_dma_unmap_page_direct(dev, addr + size))
>   		dma_direct_unmap_page(dev, addr, size, dir, attrs);
> +	else if (ops->unmap_page_check)
> +		ret = ops->unmap_page_check(dev, addr, size, dir, attrs);
>   	else if (ops->unmap_page)
>   		ops->unmap_page(dev, addr, size, dir, attrs);
>   	debug_dma_unmap_page(dev, addr, size, dir);
> +	return ret;
>   }
>   EXPORT_SYMBOL(dma_unmap_page_attrs);
>   
> 
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 5/8] dma: Use size for swiotlb boundary checks
  2021-06-03  0:41   ` Andi Kleen
  (?)
@ 2021-06-03  9:09     ` Robin Murphy
  -1 siblings, 0 replies; 116+ messages in thread
From: Robin Murphy @ 2021-06-03  9:09 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: jasowang, virtualization, hch, m.szyprowski, iommu, x86,
	sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel

On 2021-06-03 01:41, Andi Kleen wrote:
> swiotlb currently only uses the start address of a DMA to check if something
> is in the swiotlb or not. But with virtio and untrusted hosts the host
> could give some DMA mapping that crosses the swiotlb boundaries,
> potentially leaking or corrupting data. Add size checks to all the swiotlb
> checks and reject any DMAs that cross the swiotlb buffer boundaries.
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>   drivers/iommu/dma-iommu.c   | 13 ++++++-------
>   drivers/xen/swiotlb-xen.c   | 11 ++++++-----
>   include/linux/dma-mapping.h |  4 ++--
>   include/linux/swiotlb.h     |  8 +++++---
>   kernel/dma/direct.c         |  8 ++++----
>   kernel/dma/direct.h         |  8 ++++----
>   kernel/dma/mapping.c        |  4 ++--
>   net/xdp/xsk_buff_pool.c     |  2 +-
>   8 files changed, 30 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 7bcdd1205535..7ef13198721b 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -504,7 +504,7 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
>   
>   	__iommu_dma_unmap(dev, dma_addr, size);

If you can't trust size below then you've already corrupted the IOMMU 
pagetables here :/

Robin.

> -	if (unlikely(is_swiotlb_buffer(phys)))
> +	if (unlikely(is_swiotlb_buffer(phys, size)))
>   		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
>   }
>   
> @@ -575,7 +575,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device *dev, phys_addr_t phys,
>   	}
>   
>   	iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
> -	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
> +	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys, org_size))
>   		swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
>   	return iova;
>   }
> @@ -781,7 +781,7 @@ static void iommu_dma_sync_single_for_cpu(struct device *dev,
>   	if (!dev_is_dma_coherent(dev))
>   		arch_sync_dma_for_cpu(phys, size, dir);
>   
> -	if (is_swiotlb_buffer(phys))
> +	if (is_swiotlb_buffer(phys, size))
>   		swiotlb_sync_single_for_cpu(dev, phys, size, dir);
>   }
>   
> @@ -794,7 +794,7 @@ static void iommu_dma_sync_single_for_device(struct device *dev,
>   		return;
>   
>   	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
> -	if (is_swiotlb_buffer(phys))
> +	if (is_swiotlb_buffer(phys, size))
>   		swiotlb_sync_single_for_device(dev, phys, size, dir);
>   
>   	if (!dev_is_dma_coherent(dev))
> @@ -815,7 +815,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
>   		if (!dev_is_dma_coherent(dev))
>   			arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
>   
> -		if (is_swiotlb_buffer(sg_phys(sg)))
> +		if (is_swiotlb_buffer(sg_phys(sg), sg->length))
>   			swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
>   						    sg->length, dir);
>   	}
> @@ -832,10 +832,9 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
>   		return;
>   
>   	for_each_sg(sgl, sg, nelems, i) {
> -		if (is_swiotlb_buffer(sg_phys(sg)))
> +		if (is_swiotlb_buffer(sg_phys(sg), sg->length))
>   			swiotlb_sync_single_for_device(dev, sg_phys(sg),
>   						       sg->length, dir);
> -
>   		if (!dev_is_dma_coherent(dev))
>   			arch_sync_dma_for_device(sg_phys(sg), sg->length, dir);
>   	}
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 24d11861ac7d..333846af8d35 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -89,7 +89,8 @@ static inline int range_straddles_page_boundary(phys_addr_t p, size_t size)
>   	return 0;
>   }
>   
> -static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
> +static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr,
> +				 size_t size)
>   {
>   	unsigned long bfn = XEN_PFN_DOWN(dma_to_phys(dev, dma_addr));
>   	unsigned long xen_pfn = bfn_to_local_pfn(bfn);
> @@ -100,7 +101,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
>   	 * in our domain. Therefore _only_ check address within our domain.
>   	 */
>   	if (pfn_valid(PFN_DOWN(paddr)))
> -		return is_swiotlb_buffer(paddr);
> +		return is_swiotlb_buffer(paddr, size);
>   	return 0;
>   }
>   
> @@ -431,7 +432,7 @@ static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
>   	}
>   
>   	/* NOTE: We use dev_addr here, not paddr! */
> -	if (is_xen_swiotlb_buffer(hwdev, dev_addr))
> +	if (is_xen_swiotlb_buffer(hwdev, dev_addr, size))
>   		swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs);
>   }
>   
> @@ -448,7 +449,7 @@ xen_swiotlb_sync_single_for_cpu(struct device *dev, dma_addr_t dma_addr,
>   			xen_dma_sync_for_cpu(dev, dma_addr, size, dir);
>   	}
>   
> -	if (is_xen_swiotlb_buffer(dev, dma_addr))
> +	if (is_xen_swiotlb_buffer(dev, dma_addr, size))
>   		swiotlb_sync_single_for_cpu(dev, paddr, size, dir);
>   }
>   
> @@ -458,7 +459,7 @@ xen_swiotlb_sync_single_for_device(struct device *dev, dma_addr_t dma_addr,
>   {
>   	phys_addr_t paddr = xen_dma_to_phys(dev, dma_addr);
>   
> -	if (is_xen_swiotlb_buffer(dev, dma_addr))
> +	if (is_xen_swiotlb_buffer(dev, dma_addr, size))
>   		swiotlb_sync_single_for_device(dev, paddr, size, dir);
>   
>   	if (!dev_is_dma_coherent(dev)) {
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 183e7103a66d..37fbd12bd4ab 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -142,7 +142,7 @@ int dma_set_mask(struct device *dev, u64 mask);
>   int dma_set_coherent_mask(struct device *dev, u64 mask);
>   u64 dma_get_required_mask(struct device *dev);
>   size_t dma_max_mapping_size(struct device *dev);
> -bool dma_need_sync(struct device *dev, dma_addr_t dma_addr);
> +bool dma_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size);
>   unsigned long dma_get_merge_boundary(struct device *dev);
>   struct sg_table *dma_alloc_noncontiguous(struct device *dev, size_t size,
>   		enum dma_data_direction dir, gfp_t gfp, unsigned long attrs);
> @@ -258,7 +258,7 @@ static inline size_t dma_max_mapping_size(struct device *dev)
>   {
>   	return 0;
>   }
> -static inline bool dma_need_sync(struct device *dev, dma_addr_t dma_addr)
> +static inline bool dma_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size)
>   {
>   	return false;
>   }
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 216854a5e513..3e447f722d81 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -101,11 +101,13 @@ struct io_tlb_mem {
>   };
>   extern struct io_tlb_mem *io_tlb_default_mem;
>   
> -static inline bool is_swiotlb_buffer(phys_addr_t paddr)
> +static inline bool is_swiotlb_buffer(phys_addr_t paddr, size_t size)
>   {
>   	struct io_tlb_mem *mem = io_tlb_default_mem;
>   
> -	return mem && paddr >= mem->start && paddr < mem->end;
> +	if (paddr + size <= paddr) /* wrapping */
> +		return false;
> +	return mem && paddr >= mem->start && paddr + size <= mem->end;
>   }
>   
>   void __init swiotlb_exit(void);
> @@ -115,7 +117,7 @@ bool is_swiotlb_active(void);
>   void __init swiotlb_adjust_size(unsigned long size);
>   #else
>   #define swiotlb_force SWIOTLB_NO_FORCE
> -static inline bool is_swiotlb_buffer(phys_addr_t paddr)
> +static inline bool is_swiotlb_buffer(phys_addr_t paddr, size_t size)
>   {
>   	return false;
>   }
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index f737e3347059..9ae6f94e868f 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -343,7 +343,7 @@ void dma_direct_sync_sg_for_device(struct device *dev,
>   	for_each_sg(sgl, sg, nents, i) {
>   		phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg));
>   
> -		if (unlikely(is_swiotlb_buffer(paddr)))
> +		if (unlikely(is_swiotlb_buffer(paddr, sg->length)))
>   			swiotlb_sync_single_for_device(dev, paddr, sg->length,
>   						       dir);
>   
> @@ -369,7 +369,7 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
>   		if (!dev_is_dma_coherent(dev))
>   			arch_sync_dma_for_cpu(paddr, sg->length, dir);
>   
> -		if (unlikely(is_swiotlb_buffer(paddr)))
> +		if (unlikely(is_swiotlb_buffer(paddr, sg->length)))
>   			swiotlb_sync_single_for_cpu(dev, paddr, sg->length,
>   						    dir);
>   
> @@ -501,10 +501,10 @@ size_t dma_direct_max_mapping_size(struct device *dev)
>   	return SIZE_MAX;
>   }
>   
> -bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr)
> +bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size)
>   {
>   	return !dev_is_dma_coherent(dev) ||
> -		is_swiotlb_buffer(dma_to_phys(dev, dma_addr));
> +		is_swiotlb_buffer(dma_to_phys(dev, dma_addr), size);
>   }
>   
>   /**
> diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
> index 50afc05b6f1d..4a17e431ae56 100644
> --- a/kernel/dma/direct.h
> +++ b/kernel/dma/direct.h
> @@ -16,7 +16,7 @@ bool dma_direct_can_mmap(struct device *dev);
>   int dma_direct_mmap(struct device *dev, struct vm_area_struct *vma,
>   		void *cpu_addr, dma_addr_t dma_addr, size_t size,
>   		unsigned long attrs);
> -bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr);
> +bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size);
>   int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
>   		enum dma_data_direction dir, unsigned long attrs);
>   size_t dma_direct_max_mapping_size(struct device *dev);
> @@ -56,7 +56,7 @@ static inline void dma_direct_sync_single_for_device(struct device *dev,
>   {
>   	phys_addr_t paddr = dma_to_phys(dev, addr);
>   
> -	if (unlikely(is_swiotlb_buffer(paddr)))
> +	if (unlikely(is_swiotlb_buffer(paddr, size)))
>   		swiotlb_sync_single_for_device(dev, paddr, size, dir);
>   
>   	if (!dev_is_dma_coherent(dev))
> @@ -73,7 +73,7 @@ static inline void dma_direct_sync_single_for_cpu(struct device *dev,
>   		arch_sync_dma_for_cpu_all();
>   	}
>   
> -	if (unlikely(is_swiotlb_buffer(paddr)))
> +	if (unlikely(is_swiotlb_buffer(paddr, size)))
>   		swiotlb_sync_single_for_cpu(dev, paddr, size, dir);
>   
>   	if (dir == DMA_FROM_DEVICE)
> @@ -113,7 +113,7 @@ static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
>   	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
>   		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
>   
> -	if (unlikely(is_swiotlb_buffer(phys)))
> +	if (unlikely(is_swiotlb_buffer(phys, size)))
>   		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
>   }
>   #endif /* _KERNEL_DMA_DIRECT_H */
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 2b06a809d0b9..9bf02c8d7d1b 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -716,12 +716,12 @@ size_t dma_max_mapping_size(struct device *dev)
>   }
>   EXPORT_SYMBOL_GPL(dma_max_mapping_size);
>   
> -bool dma_need_sync(struct device *dev, dma_addr_t dma_addr)
> +bool dma_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size)
>   {
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
>   	if (dma_map_direct(dev, ops))
> -		return dma_direct_need_sync(dev, dma_addr);
> +		return dma_direct_need_sync(dev, dma_addr, size);
>   	return ops->sync_single_for_cpu || ops->sync_single_for_device;
>   }
>   EXPORT_SYMBOL_GPL(dma_need_sync);
> diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> index 8de01aaac4a0..c1e404fe0cf4 100644
> --- a/net/xdp/xsk_buff_pool.c
> +++ b/net/xdp/xsk_buff_pool.c
> @@ -399,7 +399,7 @@ int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
>   			__xp_dma_unmap(dma_map, attrs);
>   			return -ENOMEM;
>   		}
> -		if (dma_need_sync(dev, dma))
> +		if (dma_need_sync(dev, dma, PAGE_SIZE))
>   			dma_map->dma_need_sync = true;
>   		dma_map->dma_pages[i] = dma;
>   	}
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 5/8] dma: Use size for swiotlb boundary checks
@ 2021-06-03  9:09     ` Robin Murphy
  0 siblings, 0 replies; 116+ messages in thread
From: Robin Murphy @ 2021-06-03  9:09 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: jasowang, x86, linux-kernel, virtualization, iommu, jpoimboe, hch

On 2021-06-03 01:41, Andi Kleen wrote:
> swiotlb currently only uses the start address of a DMA to check if something
> is in the swiotlb or not. But with virtio and untrusted hosts the host
> could give some DMA mapping that crosses the swiotlb boundaries,
> potentially leaking or corrupting data. Add size checks to all the swiotlb
> checks and reject any DMAs that cross the swiotlb buffer boundaries.
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>   drivers/iommu/dma-iommu.c   | 13 ++++++-------
>   drivers/xen/swiotlb-xen.c   | 11 ++++++-----
>   include/linux/dma-mapping.h |  4 ++--
>   include/linux/swiotlb.h     |  8 +++++---
>   kernel/dma/direct.c         |  8 ++++----
>   kernel/dma/direct.h         |  8 ++++----
>   kernel/dma/mapping.c        |  4 ++--
>   net/xdp/xsk_buff_pool.c     |  2 +-
>   8 files changed, 30 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 7bcdd1205535..7ef13198721b 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -504,7 +504,7 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
>   
>   	__iommu_dma_unmap(dev, dma_addr, size);

If you can't trust size below then you've already corrupted the IOMMU 
pagetables here :/

Robin.

> -	if (unlikely(is_swiotlb_buffer(phys)))
> +	if (unlikely(is_swiotlb_buffer(phys, size)))
>   		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
>   }
>   
> @@ -575,7 +575,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device *dev, phys_addr_t phys,
>   	}
>   
>   	iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
> -	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
> +	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys, org_size))
>   		swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
>   	return iova;
>   }
> @@ -781,7 +781,7 @@ static void iommu_dma_sync_single_for_cpu(struct device *dev,
>   	if (!dev_is_dma_coherent(dev))
>   		arch_sync_dma_for_cpu(phys, size, dir);
>   
> -	if (is_swiotlb_buffer(phys))
> +	if (is_swiotlb_buffer(phys, size))
>   		swiotlb_sync_single_for_cpu(dev, phys, size, dir);
>   }
>   
> @@ -794,7 +794,7 @@ static void iommu_dma_sync_single_for_device(struct device *dev,
>   		return;
>   
>   	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
> -	if (is_swiotlb_buffer(phys))
> +	if (is_swiotlb_buffer(phys, size))
>   		swiotlb_sync_single_for_device(dev, phys, size, dir);
>   
>   	if (!dev_is_dma_coherent(dev))
> @@ -815,7 +815,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
>   		if (!dev_is_dma_coherent(dev))
>   			arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
>   
> -		if (is_swiotlb_buffer(sg_phys(sg)))
> +		if (is_swiotlb_buffer(sg_phys(sg), sg->length))
>   			swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
>   						    sg->length, dir);
>   	}
> @@ -832,10 +832,9 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
>   		return;
>   
>   	for_each_sg(sgl, sg, nelems, i) {
> -		if (is_swiotlb_buffer(sg_phys(sg)))
> +		if (is_swiotlb_buffer(sg_phys(sg), sg->length))
>   			swiotlb_sync_single_for_device(dev, sg_phys(sg),
>   						       sg->length, dir);
> -
>   		if (!dev_is_dma_coherent(dev))
>   			arch_sync_dma_for_device(sg_phys(sg), sg->length, dir);
>   	}
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 24d11861ac7d..333846af8d35 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -89,7 +89,8 @@ static inline int range_straddles_page_boundary(phys_addr_t p, size_t size)
>   	return 0;
>   }
>   
> -static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
> +static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr,
> +				 size_t size)
>   {
>   	unsigned long bfn = XEN_PFN_DOWN(dma_to_phys(dev, dma_addr));
>   	unsigned long xen_pfn = bfn_to_local_pfn(bfn);
> @@ -100,7 +101,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
>   	 * in our domain. Therefore _only_ check address within our domain.
>   	 */
>   	if (pfn_valid(PFN_DOWN(paddr)))
> -		return is_swiotlb_buffer(paddr);
> +		return is_swiotlb_buffer(paddr, size);
>   	return 0;
>   }
>   
> @@ -431,7 +432,7 @@ static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
>   	}
>   
>   	/* NOTE: We use dev_addr here, not paddr! */
> -	if (is_xen_swiotlb_buffer(hwdev, dev_addr))
> +	if (is_xen_swiotlb_buffer(hwdev, dev_addr, size))
>   		swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs);
>   }
>   
> @@ -448,7 +449,7 @@ xen_swiotlb_sync_single_for_cpu(struct device *dev, dma_addr_t dma_addr,
>   			xen_dma_sync_for_cpu(dev, dma_addr, size, dir);
>   	}
>   
> -	if (is_xen_swiotlb_buffer(dev, dma_addr))
> +	if (is_xen_swiotlb_buffer(dev, dma_addr, size))
>   		swiotlb_sync_single_for_cpu(dev, paddr, size, dir);
>   }
>   
> @@ -458,7 +459,7 @@ xen_swiotlb_sync_single_for_device(struct device *dev, dma_addr_t dma_addr,
>   {
>   	phys_addr_t paddr = xen_dma_to_phys(dev, dma_addr);
>   
> -	if (is_xen_swiotlb_buffer(dev, dma_addr))
> +	if (is_xen_swiotlb_buffer(dev, dma_addr, size))
>   		swiotlb_sync_single_for_device(dev, paddr, size, dir);
>   
>   	if (!dev_is_dma_coherent(dev)) {
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 183e7103a66d..37fbd12bd4ab 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -142,7 +142,7 @@ int dma_set_mask(struct device *dev, u64 mask);
>   int dma_set_coherent_mask(struct device *dev, u64 mask);
>   u64 dma_get_required_mask(struct device *dev);
>   size_t dma_max_mapping_size(struct device *dev);
> -bool dma_need_sync(struct device *dev, dma_addr_t dma_addr);
> +bool dma_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size);
>   unsigned long dma_get_merge_boundary(struct device *dev);
>   struct sg_table *dma_alloc_noncontiguous(struct device *dev, size_t size,
>   		enum dma_data_direction dir, gfp_t gfp, unsigned long attrs);
> @@ -258,7 +258,7 @@ static inline size_t dma_max_mapping_size(struct device *dev)
>   {
>   	return 0;
>   }
> -static inline bool dma_need_sync(struct device *dev, dma_addr_t dma_addr)
> +static inline bool dma_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size)
>   {
>   	return false;
>   }
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 216854a5e513..3e447f722d81 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -101,11 +101,13 @@ struct io_tlb_mem {
>   };
>   extern struct io_tlb_mem *io_tlb_default_mem;
>   
> -static inline bool is_swiotlb_buffer(phys_addr_t paddr)
> +static inline bool is_swiotlb_buffer(phys_addr_t paddr, size_t size)
>   {
>   	struct io_tlb_mem *mem = io_tlb_default_mem;
>   
> -	return mem && paddr >= mem->start && paddr < mem->end;
> +	if (paddr + size <= paddr) /* wrapping */
> +		return false;
> +	return mem && paddr >= mem->start && paddr + size <= mem->end;
>   }
>   
>   void __init swiotlb_exit(void);
> @@ -115,7 +117,7 @@ bool is_swiotlb_active(void);
>   void __init swiotlb_adjust_size(unsigned long size);
>   #else
>   #define swiotlb_force SWIOTLB_NO_FORCE
> -static inline bool is_swiotlb_buffer(phys_addr_t paddr)
> +static inline bool is_swiotlb_buffer(phys_addr_t paddr, size_t size)
>   {
>   	return false;
>   }
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index f737e3347059..9ae6f94e868f 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -343,7 +343,7 @@ void dma_direct_sync_sg_for_device(struct device *dev,
>   	for_each_sg(sgl, sg, nents, i) {
>   		phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg));
>   
> -		if (unlikely(is_swiotlb_buffer(paddr)))
> +		if (unlikely(is_swiotlb_buffer(paddr, sg->length)))
>   			swiotlb_sync_single_for_device(dev, paddr, sg->length,
>   						       dir);
>   
> @@ -369,7 +369,7 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
>   		if (!dev_is_dma_coherent(dev))
>   			arch_sync_dma_for_cpu(paddr, sg->length, dir);
>   
> -		if (unlikely(is_swiotlb_buffer(paddr)))
> +		if (unlikely(is_swiotlb_buffer(paddr, sg->length)))
>   			swiotlb_sync_single_for_cpu(dev, paddr, sg->length,
>   						    dir);
>   
> @@ -501,10 +501,10 @@ size_t dma_direct_max_mapping_size(struct device *dev)
>   	return SIZE_MAX;
>   }
>   
> -bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr)
> +bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size)
>   {
>   	return !dev_is_dma_coherent(dev) ||
> -		is_swiotlb_buffer(dma_to_phys(dev, dma_addr));
> +		is_swiotlb_buffer(dma_to_phys(dev, dma_addr), size);
>   }
>   
>   /**
> diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
> index 50afc05b6f1d..4a17e431ae56 100644
> --- a/kernel/dma/direct.h
> +++ b/kernel/dma/direct.h
> @@ -16,7 +16,7 @@ bool dma_direct_can_mmap(struct device *dev);
>   int dma_direct_mmap(struct device *dev, struct vm_area_struct *vma,
>   		void *cpu_addr, dma_addr_t dma_addr, size_t size,
>   		unsigned long attrs);
> -bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr);
> +bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size);
>   int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
>   		enum dma_data_direction dir, unsigned long attrs);
>   size_t dma_direct_max_mapping_size(struct device *dev);
> @@ -56,7 +56,7 @@ static inline void dma_direct_sync_single_for_device(struct device *dev,
>   {
>   	phys_addr_t paddr = dma_to_phys(dev, addr);
>   
> -	if (unlikely(is_swiotlb_buffer(paddr)))
> +	if (unlikely(is_swiotlb_buffer(paddr, size)))
>   		swiotlb_sync_single_for_device(dev, paddr, size, dir);
>   
>   	if (!dev_is_dma_coherent(dev))
> @@ -73,7 +73,7 @@ static inline void dma_direct_sync_single_for_cpu(struct device *dev,
>   		arch_sync_dma_for_cpu_all();
>   	}
>   
> -	if (unlikely(is_swiotlb_buffer(paddr)))
> +	if (unlikely(is_swiotlb_buffer(paddr, size)))
>   		swiotlb_sync_single_for_cpu(dev, paddr, size, dir);
>   
>   	if (dir == DMA_FROM_DEVICE)
> @@ -113,7 +113,7 @@ static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
>   	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
>   		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
>   
> -	if (unlikely(is_swiotlb_buffer(phys)))
> +	if (unlikely(is_swiotlb_buffer(phys, size)))
>   		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
>   }
>   #endif /* _KERNEL_DMA_DIRECT_H */
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 2b06a809d0b9..9bf02c8d7d1b 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -716,12 +716,12 @@ size_t dma_max_mapping_size(struct device *dev)
>   }
>   EXPORT_SYMBOL_GPL(dma_max_mapping_size);
>   
> -bool dma_need_sync(struct device *dev, dma_addr_t dma_addr)
> +bool dma_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size)
>   {
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
>   	if (dma_map_direct(dev, ops))
> -		return dma_direct_need_sync(dev, dma_addr);
> +		return dma_direct_need_sync(dev, dma_addr, size);
>   	return ops->sync_single_for_cpu || ops->sync_single_for_device;
>   }
>   EXPORT_SYMBOL_GPL(dma_need_sync);
> diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> index 8de01aaac4a0..c1e404fe0cf4 100644
> --- a/net/xdp/xsk_buff_pool.c
> +++ b/net/xdp/xsk_buff_pool.c
> @@ -399,7 +399,7 @@ int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
>   			__xp_dma_unmap(dma_map, attrs);
>   			return -ENOMEM;
>   		}
> -		if (dma_need_sync(dev, dma))
> +		if (dma_need_sync(dev, dma, PAGE_SIZE))
>   			dma_map->dma_need_sync = true;
>   		dma_map->dma_pages[i] = dma;
>   	}
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 5/8] dma: Use size for swiotlb boundary checks
@ 2021-06-03  9:09     ` Robin Murphy
  0 siblings, 0 replies; 116+ messages in thread
From: Robin Murphy @ 2021-06-03  9:09 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: sathyanarayanan.kuppuswamy, x86, linux-kernel, virtualization,
	iommu, jpoimboe, hch, m.szyprowski

On 2021-06-03 01:41, Andi Kleen wrote:
> swiotlb currently only uses the start address of a DMA to check if something
> is in the swiotlb or not. But with virtio and untrusted hosts the host
> could give some DMA mapping that crosses the swiotlb boundaries,
> potentially leaking or corrupting data. Add size checks to all the swiotlb
> checks and reject any DMAs that cross the swiotlb buffer boundaries.
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>   drivers/iommu/dma-iommu.c   | 13 ++++++-------
>   drivers/xen/swiotlb-xen.c   | 11 ++++++-----
>   include/linux/dma-mapping.h |  4 ++--
>   include/linux/swiotlb.h     |  8 +++++---
>   kernel/dma/direct.c         |  8 ++++----
>   kernel/dma/direct.h         |  8 ++++----
>   kernel/dma/mapping.c        |  4 ++--
>   net/xdp/xsk_buff_pool.c     |  2 +-
>   8 files changed, 30 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 7bcdd1205535..7ef13198721b 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -504,7 +504,7 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
>   
>   	__iommu_dma_unmap(dev, dma_addr, size);

If you can't trust size below then you've already corrupted the IOMMU 
pagetables here :/

Robin.

> -	if (unlikely(is_swiotlb_buffer(phys)))
> +	if (unlikely(is_swiotlb_buffer(phys, size)))
>   		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
>   }
>   
> @@ -575,7 +575,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device *dev, phys_addr_t phys,
>   	}
>   
>   	iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
> -	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
> +	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys, org_size))
>   		swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
>   	return iova;
>   }
> @@ -781,7 +781,7 @@ static void iommu_dma_sync_single_for_cpu(struct device *dev,
>   	if (!dev_is_dma_coherent(dev))
>   		arch_sync_dma_for_cpu(phys, size, dir);
>   
> -	if (is_swiotlb_buffer(phys))
> +	if (is_swiotlb_buffer(phys, size))
>   		swiotlb_sync_single_for_cpu(dev, phys, size, dir);
>   }
>   
> @@ -794,7 +794,7 @@ static void iommu_dma_sync_single_for_device(struct device *dev,
>   		return;
>   
>   	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
> -	if (is_swiotlb_buffer(phys))
> +	if (is_swiotlb_buffer(phys, size))
>   		swiotlb_sync_single_for_device(dev, phys, size, dir);
>   
>   	if (!dev_is_dma_coherent(dev))
> @@ -815,7 +815,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
>   		if (!dev_is_dma_coherent(dev))
>   			arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
>   
> -		if (is_swiotlb_buffer(sg_phys(sg)))
> +		if (is_swiotlb_buffer(sg_phys(sg), sg->length))
>   			swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
>   						    sg->length, dir);
>   	}
> @@ -832,10 +832,9 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
>   		return;
>   
>   	for_each_sg(sgl, sg, nelems, i) {
> -		if (is_swiotlb_buffer(sg_phys(sg)))
> +		if (is_swiotlb_buffer(sg_phys(sg), sg->length))
>   			swiotlb_sync_single_for_device(dev, sg_phys(sg),
>   						       sg->length, dir);
> -
>   		if (!dev_is_dma_coherent(dev))
>   			arch_sync_dma_for_device(sg_phys(sg), sg->length, dir);
>   	}
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 24d11861ac7d..333846af8d35 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -89,7 +89,8 @@ static inline int range_straddles_page_boundary(phys_addr_t p, size_t size)
>   	return 0;
>   }
>   
> -static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
> +static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr,
> +				 size_t size)
>   {
>   	unsigned long bfn = XEN_PFN_DOWN(dma_to_phys(dev, dma_addr));
>   	unsigned long xen_pfn = bfn_to_local_pfn(bfn);
> @@ -100,7 +101,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
>   	 * in our domain. Therefore _only_ check address within our domain.
>   	 */
>   	if (pfn_valid(PFN_DOWN(paddr)))
> -		return is_swiotlb_buffer(paddr);
> +		return is_swiotlb_buffer(paddr, size);
>   	return 0;
>   }
>   
> @@ -431,7 +432,7 @@ static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
>   	}
>   
>   	/* NOTE: We use dev_addr here, not paddr! */
> -	if (is_xen_swiotlb_buffer(hwdev, dev_addr))
> +	if (is_xen_swiotlb_buffer(hwdev, dev_addr, size))
>   		swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs);
>   }
>   
> @@ -448,7 +449,7 @@ xen_swiotlb_sync_single_for_cpu(struct device *dev, dma_addr_t dma_addr,
>   			xen_dma_sync_for_cpu(dev, dma_addr, size, dir);
>   	}
>   
> -	if (is_xen_swiotlb_buffer(dev, dma_addr))
> +	if (is_xen_swiotlb_buffer(dev, dma_addr, size))
>   		swiotlb_sync_single_for_cpu(dev, paddr, size, dir);
>   }
>   
> @@ -458,7 +459,7 @@ xen_swiotlb_sync_single_for_device(struct device *dev, dma_addr_t dma_addr,
>   {
>   	phys_addr_t paddr = xen_dma_to_phys(dev, dma_addr);
>   
> -	if (is_xen_swiotlb_buffer(dev, dma_addr))
> +	if (is_xen_swiotlb_buffer(dev, dma_addr, size))
>   		swiotlb_sync_single_for_device(dev, paddr, size, dir);
>   
>   	if (!dev_is_dma_coherent(dev)) {
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 183e7103a66d..37fbd12bd4ab 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -142,7 +142,7 @@ int dma_set_mask(struct device *dev, u64 mask);
>   int dma_set_coherent_mask(struct device *dev, u64 mask);
>   u64 dma_get_required_mask(struct device *dev);
>   size_t dma_max_mapping_size(struct device *dev);
> -bool dma_need_sync(struct device *dev, dma_addr_t dma_addr);
> +bool dma_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size);
>   unsigned long dma_get_merge_boundary(struct device *dev);
>   struct sg_table *dma_alloc_noncontiguous(struct device *dev, size_t size,
>   		enum dma_data_direction dir, gfp_t gfp, unsigned long attrs);
> @@ -258,7 +258,7 @@ static inline size_t dma_max_mapping_size(struct device *dev)
>   {
>   	return 0;
>   }
> -static inline bool dma_need_sync(struct device *dev, dma_addr_t dma_addr)
> +static inline bool dma_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size)
>   {
>   	return false;
>   }
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 216854a5e513..3e447f722d81 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -101,11 +101,13 @@ struct io_tlb_mem {
>   };
>   extern struct io_tlb_mem *io_tlb_default_mem;
>   
> -static inline bool is_swiotlb_buffer(phys_addr_t paddr)
> +static inline bool is_swiotlb_buffer(phys_addr_t paddr, size_t size)
>   {
>   	struct io_tlb_mem *mem = io_tlb_default_mem;
>   
> -	return mem && paddr >= mem->start && paddr < mem->end;
> +	if (paddr + size <= paddr) /* wrapping */
> +		return false;
> +	return mem && paddr >= mem->start && paddr + size <= mem->end;
>   }
>   
>   void __init swiotlb_exit(void);
> @@ -115,7 +117,7 @@ bool is_swiotlb_active(void);
>   void __init swiotlb_adjust_size(unsigned long size);
>   #else
>   #define swiotlb_force SWIOTLB_NO_FORCE
> -static inline bool is_swiotlb_buffer(phys_addr_t paddr)
> +static inline bool is_swiotlb_buffer(phys_addr_t paddr, size_t size)
>   {
>   	return false;
>   }
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index f737e3347059..9ae6f94e868f 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -343,7 +343,7 @@ void dma_direct_sync_sg_for_device(struct device *dev,
>   	for_each_sg(sgl, sg, nents, i) {
>   		phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg));
>   
> -		if (unlikely(is_swiotlb_buffer(paddr)))
> +		if (unlikely(is_swiotlb_buffer(paddr, sg->length)))
>   			swiotlb_sync_single_for_device(dev, paddr, sg->length,
>   						       dir);
>   
> @@ -369,7 +369,7 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
>   		if (!dev_is_dma_coherent(dev))
>   			arch_sync_dma_for_cpu(paddr, sg->length, dir);
>   
> -		if (unlikely(is_swiotlb_buffer(paddr)))
> +		if (unlikely(is_swiotlb_buffer(paddr, sg->length)))
>   			swiotlb_sync_single_for_cpu(dev, paddr, sg->length,
>   						    dir);
>   
> @@ -501,10 +501,10 @@ size_t dma_direct_max_mapping_size(struct device *dev)
>   	return SIZE_MAX;
>   }
>   
> -bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr)
> +bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size)
>   {
>   	return !dev_is_dma_coherent(dev) ||
> -		is_swiotlb_buffer(dma_to_phys(dev, dma_addr));
> +		is_swiotlb_buffer(dma_to_phys(dev, dma_addr), size);
>   }
>   
>   /**
> diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
> index 50afc05b6f1d..4a17e431ae56 100644
> --- a/kernel/dma/direct.h
> +++ b/kernel/dma/direct.h
> @@ -16,7 +16,7 @@ bool dma_direct_can_mmap(struct device *dev);
>   int dma_direct_mmap(struct device *dev, struct vm_area_struct *vma,
>   		void *cpu_addr, dma_addr_t dma_addr, size_t size,
>   		unsigned long attrs);
> -bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr);
> +bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size);
>   int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
>   		enum dma_data_direction dir, unsigned long attrs);
>   size_t dma_direct_max_mapping_size(struct device *dev);
> @@ -56,7 +56,7 @@ static inline void dma_direct_sync_single_for_device(struct device *dev,
>   {
>   	phys_addr_t paddr = dma_to_phys(dev, addr);
>   
> -	if (unlikely(is_swiotlb_buffer(paddr)))
> +	if (unlikely(is_swiotlb_buffer(paddr, size)))
>   		swiotlb_sync_single_for_device(dev, paddr, size, dir);
>   
>   	if (!dev_is_dma_coherent(dev))
> @@ -73,7 +73,7 @@ static inline void dma_direct_sync_single_for_cpu(struct device *dev,
>   		arch_sync_dma_for_cpu_all();
>   	}
>   
> -	if (unlikely(is_swiotlb_buffer(paddr)))
> +	if (unlikely(is_swiotlb_buffer(paddr, size)))
>   		swiotlb_sync_single_for_cpu(dev, paddr, size, dir);
>   
>   	if (dir == DMA_FROM_DEVICE)
> @@ -113,7 +113,7 @@ static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
>   	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
>   		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
>   
> -	if (unlikely(is_swiotlb_buffer(phys)))
> +	if (unlikely(is_swiotlb_buffer(phys, size)))
>   		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
>   }
>   #endif /* _KERNEL_DMA_DIRECT_H */
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 2b06a809d0b9..9bf02c8d7d1b 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -716,12 +716,12 @@ size_t dma_max_mapping_size(struct device *dev)
>   }
>   EXPORT_SYMBOL_GPL(dma_max_mapping_size);
>   
> -bool dma_need_sync(struct device *dev, dma_addr_t dma_addr)
> +bool dma_need_sync(struct device *dev, dma_addr_t dma_addr, size_t size)
>   {
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
>   	if (dma_map_direct(dev, ops))
> -		return dma_direct_need_sync(dev, dma_addr);
> +		return dma_direct_need_sync(dev, dma_addr, size);
>   	return ops->sync_single_for_cpu || ops->sync_single_for_device;
>   }
>   EXPORT_SYMBOL_GPL(dma_need_sync);
> diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> index 8de01aaac4a0..c1e404fe0cf4 100644
> --- a/net/xdp/xsk_buff_pool.c
> +++ b/net/xdp/xsk_buff_pool.c
> @@ -399,7 +399,7 @@ int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
>   			__xp_dma_unmap(dma_map, attrs);
>   			return -ENOMEM;
>   		}
> -		if (dma_need_sync(dev, dma))
> +		if (dma_need_sync(dev, dma, PAGE_SIZE))
>   			dma_map->dma_need_sync = true;
>   		dma_map->dma_pages[i] = dma;
>   	}
> 
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 6/8] dma: Add return value to dma_unmap_page
  2021-06-03  9:08     ` Robin Murphy
  (?)
@ 2021-06-03 12:36       ` Andi Kleen
  -1 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03 12:36 UTC (permalink / raw)
  To: Robin Murphy, mst
  Cc: jasowang, virtualization, hch, m.szyprowski, iommu, x86,
	sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel


>
> What it looks like to me is abusing SWIOTLB's internal housekeeping to 
> keep track of virtio-specific state. The DMA API does not attempt to 
> validate calls in general since in many cases the additional overhead 
> would be prohibitive. It has always been callers' responsibility to 
> keep track of what they mapped and make sure sync/unmap calls match, 
> and there are many, many, subtle and not-so-subtle ways for things to 
> go wrong if they don't. If virtio is not doing a good enough job of 
> that, what's the justification for making it the DMA API's problem?

In this case it's not prohibitive at all. Just adding a few error 
returns, and checking the overlap (which seems to have been already 
solved anyways) I would argue the error returns are good practice 
anyways, so that API users can check that something bad happening and 
abort.  The DMA API was never very good at proper error handling, but 
there's no reason at all to continue being bad it forever.

AFAIK the rest just works anyways, so it's not really a new problem to 
be solved.

>
>> A new callback is used to avoid changing all the IOMMU drivers.
>
> Nit: presumably by "IOMMU drivers" you actually mean arch DMA API 
> backends?
Yes
>
>  Furthermore, AFAICS it's still not going to help against exfiltrating 
> guest memory by over-unmapping the original SWIOTLB slot *without* 
> going past the end of the whole buffer,

That would be just exfiltrating data that is already shared, unless I'm 
misunderstanding you.

-Andi



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 6/8] dma: Add return value to dma_unmap_page
@ 2021-06-03 12:36       ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03 12:36 UTC (permalink / raw)
  To: Robin Murphy, mst
  Cc: jasowang, x86, linux-kernel, virtualization, iommu, jpoimboe, hch


>
> What it looks like to me is abusing SWIOTLB's internal housekeeping to 
> keep track of virtio-specific state. The DMA API does not attempt to 
> validate calls in general since in many cases the additional overhead 
> would be prohibitive. It has always been callers' responsibility to 
> keep track of what they mapped and make sure sync/unmap calls match, 
> and there are many, many, subtle and not-so-subtle ways for things to 
> go wrong if they don't. If virtio is not doing a good enough job of 
> that, what's the justification for making it the DMA API's problem?

In this case it's not prohibitive at all. Just adding a few error 
returns, and checking the overlap (which seems to have been already 
solved anyways) I would argue the error returns are good practice 
anyways, so that API users can check that something bad happening and 
abort.  The DMA API was never very good at proper error handling, but 
there's no reason at all to continue being bad it forever.

AFAIK the rest just works anyways, so it's not really a new problem to 
be solved.

>
>> A new callback is used to avoid changing all the IOMMU drivers.
>
> Nit: presumably by "IOMMU drivers" you actually mean arch DMA API 
> backends?
Yes
>
>  Furthermore, AFAICS it's still not going to help against exfiltrating 
> guest memory by over-unmapping the original SWIOTLB slot *without* 
> going past the end of the whole buffer,

That would be just exfiltrating data that is already shared, unless I'm 
misunderstanding you.

-Andi


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 6/8] dma: Add return value to dma_unmap_page
@ 2021-06-03 12:36       ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03 12:36 UTC (permalink / raw)
  To: Robin Murphy, mst
  Cc: sathyanarayanan.kuppuswamy, x86, linux-kernel, virtualization,
	iommu, jpoimboe, hch, m.szyprowski


>
> What it looks like to me is abusing SWIOTLB's internal housekeeping to 
> keep track of virtio-specific state. The DMA API does not attempt to 
> validate calls in general since in many cases the additional overhead 
> would be prohibitive. It has always been callers' responsibility to 
> keep track of what they mapped and make sure sync/unmap calls match, 
> and there are many, many, subtle and not-so-subtle ways for things to 
> go wrong if they don't. If virtio is not doing a good enough job of 
> that, what's the justification for making it the DMA API's problem?

In this case it's not prohibitive at all. Just adding a few error 
returns, and checking the overlap (which seems to have been already 
solved anyways) I would argue the error returns are good practice 
anyways, so that API users can check that something bad happening and 
abort.  The DMA API was never very good at proper error handling, but 
there's no reason at all to continue being bad it forever.

AFAIK the rest just works anyways, so it's not really a new problem to 
be solved.

>
>> A new callback is used to avoid changing all the IOMMU drivers.
>
> Nit: presumably by "IOMMU drivers" you actually mean arch DMA API 
> backends?
Yes
>
>  Furthermore, AFAICS it's still not going to help against exfiltrating 
> guest memory by over-unmapping the original SWIOTLB slot *without* 
> going past the end of the whole buffer,

That would be just exfiltrating data that is already shared, unless I'm 
misunderstanding you.

-Andi


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
  2021-06-03  3:02             ` Jason Wang
  (?)
@ 2021-06-03 13:55               ` Andi Kleen
  -1 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03 13:55 UTC (permalink / raw)
  To: Jason Wang, mst
  Cc: virtualization, hch, m.szyprowski, robin.murphy, iommu, x86,
	sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel


> Ok, but what I meant is this, if we don't read from the descriptor 
> ring, and validate all the other metadata supplied by the device (used 
> id and len). Then there should be no way for the device to suppress 
> the dma flags to write to the indirect descriptor table.
>
> Or do you have an example how it can do that?

I don't. If you can validate everything it's probably ok

The only drawback is even more code to audit and test.

-Andi



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-03 13:55               ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03 13:55 UTC (permalink / raw)
  To: Jason Wang, mst
  Cc: x86, linux-kernel, virtualization, iommu, jpoimboe, robin.murphy, hch


> Ok, but what I meant is this, if we don't read from the descriptor 
> ring, and validate all the other metadata supplied by the device (used 
> id and len). Then there should be no way for the device to suppress 
> the dma flags to write to the indirect descriptor table.
>
> Or do you have an example how it can do that?

I don't. If you can validate everything it's probably ok

The only drawback is even more code to audit and test.

-Andi


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-03 13:55               ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03 13:55 UTC (permalink / raw)
  To: Jason Wang, mst
  Cc: sathyanarayanan.kuppuswamy, x86, linux-kernel, virtualization,
	iommu, jpoimboe, robin.murphy, hch, m.szyprowski


> Ok, but what I meant is this, if we don't read from the descriptor 
> ring, and validate all the other metadata supplied by the device (used 
> id and len). Then there should be no way for the device to suppress 
> the dma flags to write to the indirect descriptor table.
>
> Or do you have an example how it can do that?

I don't. If you can validate everything it's probably ok

The only drawback is even more code to audit and test.

-Andi


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
  2021-06-03  0:41   ` Andi Kleen
  (?)
@ 2021-06-03 17:33     ` Andy Lutomirski
  -1 siblings, 0 replies; 116+ messages in thread
From: Andy Lutomirski @ 2021-06-03 17:33 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: jasowang, virtualization, hch, m.szyprowski, robin.murphy, iommu,
	x86, sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel

On 6/2/21 5:41 PM, Andi Kleen wrote:
> Only allow split mode when in a protected guest. Followon
> patches harden the split mode code paths, and we don't want
> an malicious host to force anything else. Also disallow
> indirect mode for similar reasons.

I read this as "the virtio driver is buggy.  Let's disable most of the
buggy code in one special case in which we need a driver without bugs.
In all the other cases (e.g. hardware virtio device connected over
USB-C), driver bugs are still allowed."

Can we just fix the driver without special cases?

--Andy

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-03 17:33     ` Andy Lutomirski
  0 siblings, 0 replies; 116+ messages in thread
From: Andy Lutomirski @ 2021-06-03 17:33 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: jasowang, x86, linux-kernel, virtualization, iommu, jpoimboe,
	robin.murphy, hch

On 6/2/21 5:41 PM, Andi Kleen wrote:
> Only allow split mode when in a protected guest. Followon
> patches harden the split mode code paths, and we don't want
> an malicious host to force anything else. Also disallow
> indirect mode for similar reasons.

I read this as "the virtio driver is buggy.  Let's disable most of the
buggy code in one special case in which we need a driver without bugs.
In all the other cases (e.g. hardware virtio device connected over
USB-C), driver bugs are still allowed."

Can we just fix the driver without special cases?

--Andy
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-03 17:33     ` Andy Lutomirski
  0 siblings, 0 replies; 116+ messages in thread
From: Andy Lutomirski @ 2021-06-03 17:33 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: sathyanarayanan.kuppuswamy, x86, linux-kernel, virtualization,
	iommu, jpoimboe, robin.murphy, hch, m.szyprowski

On 6/2/21 5:41 PM, Andi Kleen wrote:
> Only allow split mode when in a protected guest. Followon
> patches harden the split mode code paths, and we don't want
> an malicious host to force anything else. Also disallow
> indirect mode for similar reasons.

I read this as "the virtio driver is buggy.  Let's disable most of the
buggy code in one special case in which we need a driver without bugs.
In all the other cases (e.g. hardware virtio device connected over
USB-C), driver bugs are still allowed."

Can we just fix the driver without special cases?

--Andy
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
  2021-06-03 17:33     ` Andy Lutomirski
  (?)
@ 2021-06-03 18:00       ` Andi Kleen
  -1 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03 18:00 UTC (permalink / raw)
  To: Andy Lutomirski, mst
  Cc: jasowang, virtualization, hch, m.szyprowski, robin.murphy, iommu,
	x86, sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel


On 6/3/2021 10:33 AM, Andy Lutomirski wrote:
> On 6/2/21 5:41 PM, Andi Kleen wrote:
>> Only allow split mode when in a protected guest. Followon
>> patches harden the split mode code paths, and we don't want
>> an malicious host to force anything else. Also disallow
>> indirect mode for similar reasons.
> I read this as "the virtio driver is buggy.  Let's disable most of the
> buggy code in one special case in which we need a driver without bugs.
> In all the other cases (e.g. hardware virtio device connected over
> USB-C), driver bugs are still allowed."

My understanding is most of the other modes (except for split with separate descriptors) are obsolete and just there for compatibility. As long as they're deprecated they won't harm anyone.

-Andi


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-03 18:00       ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03 18:00 UTC (permalink / raw)
  To: Andy Lutomirski, mst
  Cc: jasowang, x86, linux-kernel, virtualization, iommu, jpoimboe,
	robin.murphy, hch


On 6/3/2021 10:33 AM, Andy Lutomirski wrote:
> On 6/2/21 5:41 PM, Andi Kleen wrote:
>> Only allow split mode when in a protected guest. Followon
>> patches harden the split mode code paths, and we don't want
>> an malicious host to force anything else. Also disallow
>> indirect mode for similar reasons.
> I read this as "the virtio driver is buggy.  Let's disable most of the
> buggy code in one special case in which we need a driver without bugs.
> In all the other cases (e.g. hardware virtio device connected over
> USB-C), driver bugs are still allowed."

My understanding is most of the other modes (except for split with separate descriptors) are obsolete and just there for compatibility. As long as they're deprecated they won't harm anyone.

-Andi

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-03 18:00       ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03 18:00 UTC (permalink / raw)
  To: Andy Lutomirski, mst
  Cc: sathyanarayanan.kuppuswamy, x86, linux-kernel, virtualization,
	iommu, jpoimboe, robin.murphy, hch, m.szyprowski


On 6/3/2021 10:33 AM, Andy Lutomirski wrote:
> On 6/2/21 5:41 PM, Andi Kleen wrote:
>> Only allow split mode when in a protected guest. Followon
>> patches harden the split mode code paths, and we don't want
>> an malicious host to force anything else. Also disallow
>> indirect mode for similar reasons.
> I read this as "the virtio driver is buggy.  Let's disable most of the
> buggy code in one special case in which we need a driver without bugs.
> In all the other cases (e.g. hardware virtio device connected over
> USB-C), driver bugs are still allowed."

My understanding is most of the other modes (except for split with separate descriptors) are obsolete and just there for compatibility. As long as they're deprecated they won't harm anyone.

-Andi

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
  2021-06-03 18:00       ` Andi Kleen
  (?)
@ 2021-06-03 19:31         ` Andy Lutomirski
  -1 siblings, 0 replies; 116+ messages in thread
From: Andy Lutomirski @ 2021-06-03 19:31 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: jasowang, virtualization, hch, m.szyprowski, robin.murphy, iommu,
	the arch/x86 maintainers, sathyanarayanan.kuppuswamy,
	Josh Poimboeuf, Linux Kernel Mailing List



On Thu, Jun 3, 2021, at 11:00 AM, Andi Kleen wrote:
> 
> On 6/3/2021 10:33 AM, Andy Lutomirski wrote:
> > On 6/2/21 5:41 PM, Andi Kleen wrote:
> >> Only allow split mode when in a protected guest. Followon
> >> patches harden the split mode code paths, and we don't want
> >> an malicious host to force anything else. Also disallow
> >> indirect mode for similar reasons.
> > I read this as "the virtio driver is buggy.  Let's disable most of the
> > buggy code in one special case in which we need a driver without bugs.
> > In all the other cases (e.g. hardware virtio device connected over
> > USB-C), driver bugs are still allowed."
> 
> My understanding is most of the other modes (except for split with 
> separate descriptors) are obsolete and just there for compatibility. As 
> long as they're deprecated they won't harm anyone.
> 
>

Tell that to every crypto downgrade attack ever.

I see two credible solutions:

1. Actually harden the virtio driver.

2. Have a new virtio-modern driver and use it for modern use cases. Maybe rename the old driver virtio-legacy or virtio-insecure.  They can share code.

Another snag you may hit: virtio’s heuristic for whether to use proper DMA ops or to bypass them is a giant kludge. I’m very slightly optimistic that getting the heuristic wrong will make the driver fail to operate but won’t allow the host to take over the guest, but I’m not really convinced. And I wrote that code!  A virtio-modern mode probably should not have a heuristic, and the various iommu-bypassing modes should be fixed to work at the bus level, not the device level.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-03 19:31         ` Andy Lutomirski
  0 siblings, 0 replies; 116+ messages in thread
From: Andy Lutomirski @ 2021-06-03 19:31 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: jasowang, the arch/x86 maintainers, Linux Kernel Mailing List,
	virtualization, iommu, Josh Poimboeuf, robin.murphy, hch



On Thu, Jun 3, 2021, at 11:00 AM, Andi Kleen wrote:
> 
> On 6/3/2021 10:33 AM, Andy Lutomirski wrote:
> > On 6/2/21 5:41 PM, Andi Kleen wrote:
> >> Only allow split mode when in a protected guest. Followon
> >> patches harden the split mode code paths, and we don't want
> >> an malicious host to force anything else. Also disallow
> >> indirect mode for similar reasons.
> > I read this as "the virtio driver is buggy.  Let's disable most of the
> > buggy code in one special case in which we need a driver without bugs.
> > In all the other cases (e.g. hardware virtio device connected over
> > USB-C), driver bugs are still allowed."
> 
> My understanding is most of the other modes (except for split with 
> separate descriptors) are obsolete and just there for compatibility. As 
> long as they're deprecated they won't harm anyone.
> 
>

Tell that to every crypto downgrade attack ever.

I see two credible solutions:

1. Actually harden the virtio driver.

2. Have a new virtio-modern driver and use it for modern use cases. Maybe rename the old driver virtio-legacy or virtio-insecure.  They can share code.

Another snag you may hit: virtio’s heuristic for whether to use proper DMA ops or to bypass them is a giant kludge. I’m very slightly optimistic that getting the heuristic wrong will make the driver fail to operate but won’t allow the host to take over the guest, but I’m not really convinced. And I wrote that code!  A virtio-modern mode probably should not have a heuristic, and the various iommu-bypassing modes should be fixed to work at the bus level, not the device level.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-03 19:31         ` Andy Lutomirski
  0 siblings, 0 replies; 116+ messages in thread
From: Andy Lutomirski @ 2021-06-03 19:31 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: sathyanarayanan.kuppuswamy, the arch/x86 maintainers,
	Linux Kernel Mailing List, virtualization, iommu, Josh Poimboeuf,
	robin.murphy, hch, m.szyprowski



On Thu, Jun 3, 2021, at 11:00 AM, Andi Kleen wrote:
> 
> On 6/3/2021 10:33 AM, Andy Lutomirski wrote:
> > On 6/2/21 5:41 PM, Andi Kleen wrote:
> >> Only allow split mode when in a protected guest. Followon
> >> patches harden the split mode code paths, and we don't want
> >> an malicious host to force anything else. Also disallow
> >> indirect mode for similar reasons.
> > I read this as "the virtio driver is buggy.  Let's disable most of the
> > buggy code in one special case in which we need a driver without bugs.
> > In all the other cases (e.g. hardware virtio device connected over
> > USB-C), driver bugs are still allowed."
> 
> My understanding is most of the other modes (except for split with 
> separate descriptors) are obsolete and just there for compatibility. As 
> long as they're deprecated they won't harm anyone.
> 
>

Tell that to every crypto downgrade attack ever.

I see two credible solutions:

1. Actually harden the virtio driver.

2. Have a new virtio-modern driver and use it for modern use cases. Maybe rename the old driver virtio-legacy or virtio-insecure.  They can share code.

Another snag you may hit: virtio’s heuristic for whether to use proper DMA ops or to bypass them is a giant kludge. I’m very slightly optimistic that getting the heuristic wrong will make the driver fail to operate but won’t allow the host to take over the guest, but I’m not really convinced. And I wrote that code!  A virtio-modern mode probably should not have a heuristic, and the various iommu-bypassing modes should be fixed to work at the bus level, not the device level.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
  2021-06-03 19:31         ` Andy Lutomirski
  (?)
@ 2021-06-03 19:53           ` Andi Kleen
  -1 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03 19:53 UTC (permalink / raw)
  To: Andy Lutomirski, mst
  Cc: jasowang, virtualization, hch, m.szyprowski, robin.murphy, iommu,
	the arch/x86 maintainers, sathyanarayanan.kuppuswamy,
	Josh Poimboeuf, Linux Kernel Mailing List


> Tell that to every crypto downgrade attack ever.

That's exactly what this patch addresses.

>
> I see two credible solutions:
>
> 1. Actually harden the virtio driver.
That's exactly what this patchkit, and the alternative approaches, like 
Jason's, are doing.
>
> 2. Have a new virtio-modern driver and use it for modern use cases. Maybe rename the old driver virtio-legacy or virtio-insecure.  They can share code.

In most use cases the legacy driver is not insecure because there is no 
memory protection anyways.

Yes maybe such a split would be a good idea for maintenance and maybe 
performance reasons, but at least from the security perspective I don't 
see any need for it.

>
> Another snag you may hit: virtio’s heuristic for whether to use proper DMA ops or to bypass them is a giant kludge. I’m very slightly optimistic that getting the heuristic wrong will make the driver fail to operate but won’t allow the host to take over the guest, but I’m not really convinced. And I wrote that code!  A virtio-modern mode probably should not have a heuristic, and the various iommu-bypassing modes should be fixed to work at the bus level, not the device level

TDX and SEV use the arch hook to enforce DMA API, so that part is also 
solved.


-Andi


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-03 19:53           ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03 19:53 UTC (permalink / raw)
  To: Andy Lutomirski, mst
  Cc: jasowang, the arch/x86 maintainers, Linux Kernel Mailing List,
	virtualization, iommu, Josh Poimboeuf, robin.murphy, hch


> Tell that to every crypto downgrade attack ever.

That's exactly what this patch addresses.

>
> I see two credible solutions:
>
> 1. Actually harden the virtio driver.
That's exactly what this patchkit, and the alternative approaches, like 
Jason's, are doing.
>
> 2. Have a new virtio-modern driver and use it for modern use cases. Maybe rename the old driver virtio-legacy or virtio-insecure.  They can share code.

In most use cases the legacy driver is not insecure because there is no 
memory protection anyways.

Yes maybe such a split would be a good idea for maintenance and maybe 
performance reasons, but at least from the security perspective I don't 
see any need for it.

>
> Another snag you may hit: virtio’s heuristic for whether to use proper DMA ops or to bypass them is a giant kludge. I’m very slightly optimistic that getting the heuristic wrong will make the driver fail to operate but won’t allow the host to take over the guest, but I’m not really convinced. And I wrote that code!  A virtio-modern mode probably should not have a heuristic, and the various iommu-bypassing modes should be fixed to work at the bus level, not the device level

TDX and SEV use the arch hook to enforce DMA API, so that part is also 
solved.


-Andi

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-03 19:53           ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03 19:53 UTC (permalink / raw)
  To: Andy Lutomirski, mst
  Cc: sathyanarayanan.kuppuswamy, the arch/x86 maintainers,
	Linux Kernel Mailing List, virtualization, iommu, Josh Poimboeuf,
	robin.murphy, hch, m.szyprowski


> Tell that to every crypto downgrade attack ever.

That's exactly what this patch addresses.

>
> I see two credible solutions:
>
> 1. Actually harden the virtio driver.
That's exactly what this patchkit, and the alternative approaches, like 
Jason's, are doing.
>
> 2. Have a new virtio-modern driver and use it for modern use cases. Maybe rename the old driver virtio-legacy or virtio-insecure.  They can share code.

In most use cases the legacy driver is not insecure because there is no 
memory protection anyways.

Yes maybe such a split would be a good idea for maintenance and maybe 
performance reasons, but at least from the security perspective I don't 
see any need for it.

>
> Another snag you may hit: virtio’s heuristic for whether to use proper DMA ops or to bypass them is a giant kludge. I’m very slightly optimistic that getting the heuristic wrong will make the driver fail to operate but won’t allow the host to take over the guest, but I’m not really convinced. And I wrote that code!  A virtio-modern mode probably should not have a heuristic, and the various iommu-bypassing modes should be fixed to work at the bus level, not the device level

TDX and SEV use the arch hook to enforce DMA API, so that part is also 
solved.


-Andi

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
  2021-06-03 19:53           ` Andi Kleen
  (?)
@ 2021-06-03 22:17             ` Andy Lutomirski
  -1 siblings, 0 replies; 116+ messages in thread
From: Andy Lutomirski @ 2021-06-03 22:17 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: Jason Wang, virtualization, hch, m.szyprowski, robin.murphy,
	iommu, the arch/x86 maintainers, sathyanarayanan.kuppuswamy,
	Josh Poimboeuf, Linux Kernel Mailing List



On Thu, Jun 3, 2021, at 12:53 PM, Andi Kleen wrote:
> 
> > Tell that to every crypto downgrade attack ever.
> 
> That's exactly what this patch addresses.
> 
> >
> > I see two credible solutions:
> >
> > 1. Actually harden the virtio driver.
> That's exactly what this patchkit, and the alternative approaches, like 
> Jason's, are doing.
> >
> > 2. Have a new virtio-modern driver and use it for modern use cases. Maybe rename the old driver virtio-legacy or virtio-insecure.  They can share code.
> 
> In most use cases the legacy driver is not insecure because there is no 
> memory protection anyways.
> 
> Yes maybe such a split would be a good idea for maintenance and maybe 
> performance reasons, but at least from the security perspective I don't 
> see any need for it.


Please reread my email.

We do not need an increasing pile of kludges to make TDX and SEV “secure”.  We need the actual loaded driver to be secure.  The virtio architecture is full of legacy nonsense, and there is no good reason for SEV and TDX to be a giant special case.

As I said before, real PCIe (Thunderbolt/USB-C or anything else) has the exact same problem.  The fact that TDX has encrypted memory is, at best, a poor proxy for the actual condition.  The actual condition is that the host does not trust the device to implement the virtio protocol correctly.

> 
> >
> > Another snag you may hit: virtio’s heuristic for whether to use proper DMA ops or to bypass them is a giant kludge. I’m very slightly optimistic that getting the heuristic wrong will make the driver fail to operate but won’t allow the host to take over the guest, but I’m not really convinced. And I wrote that code!  A virtio-modern mode probably should not have a heuristic, and the various iommu-bypassing modes should be fixed to work at the bus level, not the device level
> 
> TDX and SEV use the arch hook to enforce DMA API, so that part is also 
> solved.
> 

Can you point me to the code you’re referring to?

> 
> -Andi
> 
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-03 22:17             ` Andy Lutomirski
  0 siblings, 0 replies; 116+ messages in thread
From: Andy Lutomirski @ 2021-06-03 22:17 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: Jason Wang, the arch/x86 maintainers, Linux Kernel Mailing List,
	virtualization, iommu, Josh Poimboeuf, robin.murphy, hch



On Thu, Jun 3, 2021, at 12:53 PM, Andi Kleen wrote:
> 
> > Tell that to every crypto downgrade attack ever.
> 
> That's exactly what this patch addresses.
> 
> >
> > I see two credible solutions:
> >
> > 1. Actually harden the virtio driver.
> That's exactly what this patchkit, and the alternative approaches, like 
> Jason's, are doing.
> >
> > 2. Have a new virtio-modern driver and use it for modern use cases. Maybe rename the old driver virtio-legacy or virtio-insecure.  They can share code.
> 
> In most use cases the legacy driver is not insecure because there is no 
> memory protection anyways.
> 
> Yes maybe such a split would be a good idea for maintenance and maybe 
> performance reasons, but at least from the security perspective I don't 
> see any need for it.


Please reread my email.

We do not need an increasing pile of kludges to make TDX and SEV “secure”.  We need the actual loaded driver to be secure.  The virtio architecture is full of legacy nonsense, and there is no good reason for SEV and TDX to be a giant special case.

As I said before, real PCIe (Thunderbolt/USB-C or anything else) has the exact same problem.  The fact that TDX has encrypted memory is, at best, a poor proxy for the actual condition.  The actual condition is that the host does not trust the device to implement the virtio protocol correctly.

> 
> >
> > Another snag you may hit: virtio’s heuristic for whether to use proper DMA ops or to bypass them is a giant kludge. I’m very slightly optimistic that getting the heuristic wrong will make the driver fail to operate but won’t allow the host to take over the guest, but I’m not really convinced. And I wrote that code!  A virtio-modern mode probably should not have a heuristic, and the various iommu-bypassing modes should be fixed to work at the bus level, not the device level
> 
> TDX and SEV use the arch hook to enforce DMA API, so that part is also 
> solved.
> 

Can you point me to the code you’re referring to?

> 
> -Andi
> 
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-03 22:17             ` Andy Lutomirski
  0 siblings, 0 replies; 116+ messages in thread
From: Andy Lutomirski @ 2021-06-03 22:17 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: sathyanarayanan.kuppuswamy, the arch/x86 maintainers,
	Linux Kernel Mailing List, virtualization, iommu, Josh Poimboeuf,
	robin.murphy, hch, m.szyprowski



On Thu, Jun 3, 2021, at 12:53 PM, Andi Kleen wrote:
> 
> > Tell that to every crypto downgrade attack ever.
> 
> That's exactly what this patch addresses.
> 
> >
> > I see two credible solutions:
> >
> > 1. Actually harden the virtio driver.
> That's exactly what this patchkit, and the alternative approaches, like 
> Jason's, are doing.
> >
> > 2. Have a new virtio-modern driver and use it for modern use cases. Maybe rename the old driver virtio-legacy or virtio-insecure.  They can share code.
> 
> In most use cases the legacy driver is not insecure because there is no 
> memory protection anyways.
> 
> Yes maybe such a split would be a good idea for maintenance and maybe 
> performance reasons, but at least from the security perspective I don't 
> see any need for it.


Please reread my email.

We do not need an increasing pile of kludges to make TDX and SEV “secure”.  We need the actual loaded driver to be secure.  The virtio architecture is full of legacy nonsense, and there is no good reason for SEV and TDX to be a giant special case.

As I said before, real PCIe (Thunderbolt/USB-C or anything else) has the exact same problem.  The fact that TDX has encrypted memory is, at best, a poor proxy for the actual condition.  The actual condition is that the host does not trust the device to implement the virtio protocol correctly.

> 
> >
> > Another snag you may hit: virtio’s heuristic for whether to use proper DMA ops or to bypass them is a giant kludge. I’m very slightly optimistic that getting the heuristic wrong will make the driver fail to operate but won’t allow the host to take over the guest, but I’m not really convinced. And I wrote that code!  A virtio-modern mode probably should not have a heuristic, and the various iommu-bypassing modes should be fixed to work at the bus level, not the device level
> 
> TDX and SEV use the arch hook to enforce DMA API, so that part is also 
> solved.
> 

Can you point me to the code you’re referring to?

> 
> -Andi
> 
> 
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
  2021-06-03 22:17             ` Andy Lutomirski
  (?)
@ 2021-06-03 23:32               ` Andi Kleen
  -1 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03 23:32 UTC (permalink / raw)
  To: Andy Lutomirski, mst
  Cc: Jason Wang, virtualization, hch, m.szyprowski, robin.murphy,
	iommu, the arch/x86 maintainers, sathyanarayanan.kuppuswamy,
	Josh Poimboeuf, Linux Kernel Mailing List


> We do not need an increasing pile of kludges

Do you mean disabling features is a kludge?

If yes I disagree with that characterization.


> to make TDX and SEV “secure”.  We need the actual loaded driver to be secure.  The virtio architecture is full of legacy nonsense,
> and there is no good reason for SEV and TDX to be a giant special case.

I don't know where you see a "giant special case". Except for the 
limited feature negotiation all the changes are common, and the 
disabling of features (which is not new BTW, but already done e.g. with 
forcing DMA API in some cases) can be of course used by all these other 
technologies too. But it just cannot be done by default for everything 
because it would break compatibility. So every technology with such 
requirements has to explicitly opt-in.


>
> As I said before, real PCIe (Thunderbolt/USB-C or anything else) has the exact same problem.  The fact that TDX has encrypted memory is, at best, a poor proxy for the actual condition.  The actual condition is that the host does not trust the device to implement the virtio protocol correctly.

Right they can do similar limitations of feature sets. But again it 
cannot be default.


>
>>
>> TDX and SEV use the arch hook to enforce DMA API, so that part is also
>> solved.
>>
> Can you point me to the code you’re referring to?

See 4/8 in this patch kit. It uses an existing hook which is already 
used in tree by s390.


-Andi




^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-03 23:32               ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03 23:32 UTC (permalink / raw)
  To: Andy Lutomirski, mst
  Cc: Jason Wang, the arch/x86 maintainers, Linux Kernel Mailing List,
	virtualization, iommu, Josh Poimboeuf, robin.murphy, hch


> We do not need an increasing pile of kludges

Do you mean disabling features is a kludge?

If yes I disagree with that characterization.


> to make TDX and SEV “secure”.  We need the actual loaded driver to be secure.  The virtio architecture is full of legacy nonsense,
> and there is no good reason for SEV and TDX to be a giant special case.

I don't know where you see a "giant special case". Except for the 
limited feature negotiation all the changes are common, and the 
disabling of features (which is not new BTW, but already done e.g. with 
forcing DMA API in some cases) can be of course used by all these other 
technologies too. But it just cannot be done by default for everything 
because it would break compatibility. So every technology with such 
requirements has to explicitly opt-in.


>
> As I said before, real PCIe (Thunderbolt/USB-C or anything else) has the exact same problem.  The fact that TDX has encrypted memory is, at best, a poor proxy for the actual condition.  The actual condition is that the host does not trust the device to implement the virtio protocol correctly.

Right they can do similar limitations of feature sets. But again it 
cannot be default.


>
>>
>> TDX and SEV use the arch hook to enforce DMA API, so that part is also
>> solved.
>>
> Can you point me to the code you’re referring to?

See 4/8 in this patch kit. It uses an existing hook which is already 
used in tree by s390.


-Andi



_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-03 23:32               ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-03 23:32 UTC (permalink / raw)
  To: Andy Lutomirski, mst
  Cc: sathyanarayanan.kuppuswamy, the arch/x86 maintainers,
	Linux Kernel Mailing List, virtualization, iommu, Josh Poimboeuf,
	robin.murphy, hch, m.szyprowski


> We do not need an increasing pile of kludges

Do you mean disabling features is a kludge?

If yes I disagree with that characterization.


> to make TDX and SEV “secure”.  We need the actual loaded driver to be secure.  The virtio architecture is full of legacy nonsense,
> and there is no good reason for SEV and TDX to be a giant special case.

I don't know where you see a "giant special case". Except for the 
limited feature negotiation all the changes are common, and the 
disabling of features (which is not new BTW, but already done e.g. with 
forcing DMA API in some cases) can be of course used by all these other 
technologies too. But it just cannot be done by default for everything 
because it would break compatibility. So every technology with such 
requirements has to explicitly opt-in.


>
> As I said before, real PCIe (Thunderbolt/USB-C or anything else) has the exact same problem.  The fact that TDX has encrypted memory is, at best, a poor proxy for the actual condition.  The actual condition is that the host does not trust the device to implement the virtio protocol correctly.

Right they can do similar limitations of feature sets. But again it 
cannot be default.


>
>>
>> TDX and SEV use the arch hook to enforce DMA API, so that part is also
>> solved.
>>
> Can you point me to the code you’re referring to?

See 4/8 in this patch kit. It uses an existing hook which is already 
used in tree by s390.


-Andi



_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
  2021-06-03 19:31         ` Andy Lutomirski
  (?)
@ 2021-06-04  1:22           ` Jason Wang
  -1 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-04  1:22 UTC (permalink / raw)
  To: Andy Lutomirski, Andi Kleen, mst
  Cc: virtualization, hch, m.szyprowski, robin.murphy, iommu,
	the arch/x86 maintainers, sathyanarayanan.kuppuswamy,
	Josh Poimboeuf, Linux Kernel Mailing List


在 2021/6/4 上午3:31, Andy Lutomirski 写道:
>
> On Thu, Jun 3, 2021, at 11:00 AM, Andi Kleen wrote:
>> On 6/3/2021 10:33 AM, Andy Lutomirski wrote:
>>> On 6/2/21 5:41 PM, Andi Kleen wrote:
>>>> Only allow split mode when in a protected guest. Followon
>>>> patches harden the split mode code paths, and we don't want
>>>> an malicious host to force anything else. Also disallow
>>>> indirect mode for similar reasons.
>>> I read this as "the virtio driver is buggy.  Let's disable most of the
>>> buggy code in one special case in which we need a driver without bugs.
>>> In all the other cases (e.g. hardware virtio device connected over
>>> USB-C), driver bugs are still allowed."
>> My understanding is most of the other modes (except for split with
>> separate descriptors) are obsolete and just there for compatibility. As
>> long as they're deprecated they won't harm anyone.
>>
>>
> Tell that to every crypto downgrade attack ever.
>
> I see two credible solutions:
>
> 1. Actually harden the virtio driver.
>
> 2. Have a new virtio-modern driver and use it for modern use cases. Maybe rename the old driver virtio-legacy or virtio-insecure.  They can share code.


Note that we had already split legacy driver out which can be turned off 
via Kconfig.


>
> Another snag you may hit: virtio’s heuristic for whether to use proper DMA ops or to bypass them is a giant kludge. I’m very slightly optimistic that getting the heuristic wrong will make the driver fail to operate but won’t allow the host to take over the guest, but I’m not really convinced. And I wrote that code!  A virtio-modern mode probably should not have a heuristic, and the various iommu-bypassing modes should be fixed to work at the bus level, not the device level.


I remember there's a very long discussion about this and probably 
without any conclusion. Fortunately, the management layer has been 
taught to enforce VIRTIO_F_ACCESS_PLATFORM for encrypted guests.

A possible way to fix this is without any conflicts is to mandate the 
VIRTIO_F_ACCESS_PLATFORM in version 1.2.

Thanks


>


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-04  1:22           ` Jason Wang
  0 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-04  1:22 UTC (permalink / raw)
  To: Andy Lutomirski, Andi Kleen, mst
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List,
	virtualization, iommu, Josh Poimboeuf, robin.murphy, hch


在 2021/6/4 上午3:31, Andy Lutomirski 写道:
>
> On Thu, Jun 3, 2021, at 11:00 AM, Andi Kleen wrote:
>> On 6/3/2021 10:33 AM, Andy Lutomirski wrote:
>>> On 6/2/21 5:41 PM, Andi Kleen wrote:
>>>> Only allow split mode when in a protected guest. Followon
>>>> patches harden the split mode code paths, and we don't want
>>>> an malicious host to force anything else. Also disallow
>>>> indirect mode for similar reasons.
>>> I read this as "the virtio driver is buggy.  Let's disable most of the
>>> buggy code in one special case in which we need a driver without bugs.
>>> In all the other cases (e.g. hardware virtio device connected over
>>> USB-C), driver bugs are still allowed."
>> My understanding is most of the other modes (except for split with
>> separate descriptors) are obsolete and just there for compatibility. As
>> long as they're deprecated they won't harm anyone.
>>
>>
> Tell that to every crypto downgrade attack ever.
>
> I see two credible solutions:
>
> 1. Actually harden the virtio driver.
>
> 2. Have a new virtio-modern driver and use it for modern use cases. Maybe rename the old driver virtio-legacy or virtio-insecure.  They can share code.


Note that we had already split legacy driver out which can be turned off 
via Kconfig.


>
> Another snag you may hit: virtio’s heuristic for whether to use proper DMA ops or to bypass them is a giant kludge. I’m very slightly optimistic that getting the heuristic wrong will make the driver fail to operate but won’t allow the host to take over the guest, but I’m not really convinced. And I wrote that code!  A virtio-modern mode probably should not have a heuristic, and the various iommu-bypassing modes should be fixed to work at the bus level, not the device level.


I remember there's a very long discussion about this and probably 
without any conclusion. Fortunately, the management layer has been 
taught to enforce VIRTIO_F_ACCESS_PLATFORM for encrypted guests.

A possible way to fix this is without any conflicts is to mandate the 
VIRTIO_F_ACCESS_PLATFORM in version 1.2.

Thanks


>

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-04  1:22           ` Jason Wang
  0 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-04  1:22 UTC (permalink / raw)
  To: Andy Lutomirski, Andi Kleen, mst
  Cc: sathyanarayanan.kuppuswamy, the arch/x86 maintainers,
	Linux Kernel Mailing List, virtualization, iommu, Josh Poimboeuf,
	robin.murphy, hch, m.szyprowski


在 2021/6/4 上午3:31, Andy Lutomirski 写道:
>
> On Thu, Jun 3, 2021, at 11:00 AM, Andi Kleen wrote:
>> On 6/3/2021 10:33 AM, Andy Lutomirski wrote:
>>> On 6/2/21 5:41 PM, Andi Kleen wrote:
>>>> Only allow split mode when in a protected guest. Followon
>>>> patches harden the split mode code paths, and we don't want
>>>> an malicious host to force anything else. Also disallow
>>>> indirect mode for similar reasons.
>>> I read this as "the virtio driver is buggy.  Let's disable most of the
>>> buggy code in one special case in which we need a driver without bugs.
>>> In all the other cases (e.g. hardware virtio device connected over
>>> USB-C), driver bugs are still allowed."
>> My understanding is most of the other modes (except for split with
>> separate descriptors) are obsolete and just there for compatibility. As
>> long as they're deprecated they won't harm anyone.
>>
>>
> Tell that to every crypto downgrade attack ever.
>
> I see two credible solutions:
>
> 1. Actually harden the virtio driver.
>
> 2. Have a new virtio-modern driver and use it for modern use cases. Maybe rename the old driver virtio-legacy or virtio-insecure.  They can share code.


Note that we had already split legacy driver out which can be turned off 
via Kconfig.


>
> Another snag you may hit: virtio’s heuristic for whether to use proper DMA ops or to bypass them is a giant kludge. I’m very slightly optimistic that getting the heuristic wrong will make the driver fail to operate but won’t allow the host to take over the guest, but I’m not really convinced. And I wrote that code!  A virtio-modern mode probably should not have a heuristic, and the various iommu-bypassing modes should be fixed to work at the bus level, not the device level.


I remember there's a very long discussion about this and probably 
without any conclusion. Fortunately, the management layer has been 
taught to enforce VIRTIO_F_ACCESS_PLATFORM for encrypted guests.

A possible way to fix this is without any conflicts is to mandate the 
VIRTIO_F_ACCESS_PLATFORM in version 1.2.

Thanks


>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
  2021-06-03 18:00       ` Andi Kleen
  (?)
@ 2021-06-04  1:29         ` Jason Wang
  -1 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-04  1:29 UTC (permalink / raw)
  To: Andi Kleen, Andy Lutomirski, mst
  Cc: virtualization, hch, m.szyprowski, robin.murphy, iommu, x86,
	sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel


在 2021/6/4 上午2:00, Andi Kleen 写道:
>
> On 6/3/2021 10:33 AM, Andy Lutomirski wrote:
>> On 6/2/21 5:41 PM, Andi Kleen wrote:
>>> Only allow split mode when in a protected guest. Followon
>>> patches harden the split mode code paths, and we don't want
>>> an malicious host to force anything else. Also disallow
>>> indirect mode for similar reasons.
>> I read this as "the virtio driver is buggy.  Let's disable most of the
>> buggy code in one special case in which we need a driver without bugs.
>> In all the other cases (e.g. hardware virtio device connected over
>> USB-C), driver bugs are still allowed."
>
> My understanding is most of the other modes (except for split with 
> separate descriptors) are obsolete and just there for compatibility. 
> As long as they're deprecated they won't harm anyone.
>
> -Andi
>

For "mode" do you packed vs split? If yes, it's not just for 
compatibility. Though packed virtqueue is designed to be more hardware 
friendly, most hardware vendors choose to start from split.

Thanks


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-04  1:29         ` Jason Wang
  0 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-04  1:29 UTC (permalink / raw)
  To: Andi Kleen, Andy Lutomirski, mst
  Cc: x86, linux-kernel, virtualization, iommu, jpoimboe, robin.murphy, hch


在 2021/6/4 上午2:00, Andi Kleen 写道:
>
> On 6/3/2021 10:33 AM, Andy Lutomirski wrote:
>> On 6/2/21 5:41 PM, Andi Kleen wrote:
>>> Only allow split mode when in a protected guest. Followon
>>> patches harden the split mode code paths, and we don't want
>>> an malicious host to force anything else. Also disallow
>>> indirect mode for similar reasons.
>> I read this as "the virtio driver is buggy.  Let's disable most of the
>> buggy code in one special case in which we need a driver without bugs.
>> In all the other cases (e.g. hardware virtio device connected over
>> USB-C), driver bugs are still allowed."
>
> My understanding is most of the other modes (except for split with 
> separate descriptors) are obsolete and just there for compatibility. 
> As long as they're deprecated they won't harm anyone.
>
> -Andi
>

For "mode" do you packed vs split? If yes, it's not just for 
compatibility. Though packed virtqueue is designed to be more hardware 
friendly, most hardware vendors choose to start from split.

Thanks

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-04  1:29         ` Jason Wang
  0 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-04  1:29 UTC (permalink / raw)
  To: Andi Kleen, Andy Lutomirski, mst
  Cc: sathyanarayanan.kuppuswamy, x86, linux-kernel, virtualization,
	iommu, jpoimboe, robin.murphy, hch, m.szyprowski


在 2021/6/4 上午2:00, Andi Kleen 写道:
>
> On 6/3/2021 10:33 AM, Andy Lutomirski wrote:
>> On 6/2/21 5:41 PM, Andi Kleen wrote:
>>> Only allow split mode when in a protected guest. Followon
>>> patches harden the split mode code paths, and we don't want
>>> an malicious host to force anything else. Also disallow
>>> indirect mode for similar reasons.
>> I read this as "the virtio driver is buggy.  Let's disable most of the
>> buggy code in one special case in which we need a driver without bugs.
>> In all the other cases (e.g. hardware virtio device connected over
>> USB-C), driver bugs are still allowed."
>
> My understanding is most of the other modes (except for split with 
> separate descriptors) are obsolete and just there for compatibility. 
> As long as they're deprecated they won't harm anyone.
>
> -Andi
>

For "mode" do you packed vs split? If yes, it's not just for 
compatibility. Though packed virtqueue is designed to be more hardware 
friendly, most hardware vendors choose to start from split.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
  2021-06-03 23:32               ` Andi Kleen
  (?)
@ 2021-06-04  1:46                 ` Andy Lutomirski
  -1 siblings, 0 replies; 116+ messages in thread
From: Andy Lutomirski @ 2021-06-04  1:46 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: Jason Wang, virtualization, hch, m.szyprowski, robin.murphy,
	iommu, the arch/x86 maintainers, sathyanarayanan.kuppuswamy,
	Josh Poimboeuf, Linux Kernel Mailing List

On 6/3/21 4:32 PM, Andi Kleen wrote:
> 
>> We do not need an increasing pile of kludges
> 
> Do you mean disabling features is a kludge?
> 
> If yes I disagree with that characterization.
> 
> 
>> to make TDX and SEV “secure”.  We need the actual loaded driver to be
>> secure.  The virtio architecture is full of legacy nonsense,
>> and there is no good reason for SEV and TDX to be a giant special case.
> 
> I don't know where you see a "giant special case". Except for the
> limited feature negotiation all the changes are common, and the
> disabling of features (which is not new BTW, but already done e.g. with
> forcing DMA API in some cases) can be of course used by all these other
> technologies too. But it just cannot be done by default for everything
> because it would break compatibility. So every technology with such
> requirements has to explicitly opt-in.
> 
> 
>>
>> As I said before, real PCIe (Thunderbolt/USB-C or anything else) has
>> the exact same problem.  The fact that TDX has encrypted memory is, at
>> best, a poor proxy for the actual condition.  The actual condition is
>> that the host does not trust the device to implement the virtio
>> protocol correctly.
> 
> Right they can do similar limitations of feature sets. But again it
> cannot be default.

Let me try again.

For most Linux drivers, a report that a misbehaving device can corrupt
host memory is a bug, not a feature.  If a USB device can corrupt kernel
memory, that's a serious bug.  If a USB-C device can corrupt kernel
memory, that's also a serious bug, although, sadly, we probably have
lots of these bugs.  If a Firewire device can corrupt kernel memory,
news at 11.  If a Bluetooth or WiFi peer can corrupt kernel memory,
people write sonnets about it and give it clever names.  Why is virtio
special?

If, for some reason, the virtio driver cannot be fixed so that it is
secure and compatible [1], then I think that the limited cases that are
secure should be accessible to anyone, with or without TDX.  Have a
virtio.secure_mode module option or a udev-controllable parameter or an
alternative driver name or *something*.  An alternative driver name
would allow userspace to prevent the insecure mode from auto-binding to
devices.  And make whatever system configures encrypted guests for
security use this mode.  (Linux is not going to be magically secure just
by booting it in TDX.  There's a whole process of unsealing or remote
attestation, something needs to prevent the hypervisor from connecting a
virtual keyboard and typing init=/bin/bash, something needs to provision
an SSH key, etc.)

In my opinion, it is not so great to identify bugs in the driver and
then say that they're only being fixed for TDX and SEV.

Keep in mind that, as I understand it, there is nothing virt specific
about virtio.  There are real physical devices that speak virtio.

[1] The DMA quirk is nasty.  Fortunately, it's the only case I'm aware
of in which the virtio driver genuinely cannot be made secure and
compatible at the smae time.  Also, fortunately, most real deployments
except on powerpc work just fine with the DMA quirk unquirked.

> 
> 
>>
>>>
>>> TDX and SEV use the arch hook to enforce DMA API, so that part is also
>>> solved.
>>>
>> Can you point me to the code you’re referring to?
> 
> See 4/8 in this patch kit. It uses an existing hook which is already
> used in tree by s390.

This one:

int arch_has_restricted_virtio_memory_access(void)
+{
+	return is_tdx_guest();
+}

I'm looking at a fairly recent kernel, and I don't see anything for s390
wired up in vring_use_dma_api.


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-04  1:46                 ` Andy Lutomirski
  0 siblings, 0 replies; 116+ messages in thread
From: Andy Lutomirski @ 2021-06-04  1:46 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: Jason Wang, the arch/x86 maintainers, Linux Kernel Mailing List,
	virtualization, iommu, Josh Poimboeuf, robin.murphy, hch

On 6/3/21 4:32 PM, Andi Kleen wrote:
> 
>> We do not need an increasing pile of kludges
> 
> Do you mean disabling features is a kludge?
> 
> If yes I disagree with that characterization.
> 
> 
>> to make TDX and SEV “secure”.  We need the actual loaded driver to be
>> secure.  The virtio architecture is full of legacy nonsense,
>> and there is no good reason for SEV and TDX to be a giant special case.
> 
> I don't know where you see a "giant special case". Except for the
> limited feature negotiation all the changes are common, and the
> disabling of features (which is not new BTW, but already done e.g. with
> forcing DMA API in some cases) can be of course used by all these other
> technologies too. But it just cannot be done by default for everything
> because it would break compatibility. So every technology with such
> requirements has to explicitly opt-in.
> 
> 
>>
>> As I said before, real PCIe (Thunderbolt/USB-C or anything else) has
>> the exact same problem.  The fact that TDX has encrypted memory is, at
>> best, a poor proxy for the actual condition.  The actual condition is
>> that the host does not trust the device to implement the virtio
>> protocol correctly.
> 
> Right they can do similar limitations of feature sets. But again it
> cannot be default.

Let me try again.

For most Linux drivers, a report that a misbehaving device can corrupt
host memory is a bug, not a feature.  If a USB device can corrupt kernel
memory, that's a serious bug.  If a USB-C device can corrupt kernel
memory, that's also a serious bug, although, sadly, we probably have
lots of these bugs.  If a Firewire device can corrupt kernel memory,
news at 11.  If a Bluetooth or WiFi peer can corrupt kernel memory,
people write sonnets about it and give it clever names.  Why is virtio
special?

If, for some reason, the virtio driver cannot be fixed so that it is
secure and compatible [1], then I think that the limited cases that are
secure should be accessible to anyone, with or without TDX.  Have a
virtio.secure_mode module option or a udev-controllable parameter or an
alternative driver name or *something*.  An alternative driver name
would allow userspace to prevent the insecure mode from auto-binding to
devices.  And make whatever system configures encrypted guests for
security use this mode.  (Linux is not going to be magically secure just
by booting it in TDX.  There's a whole process of unsealing or remote
attestation, something needs to prevent the hypervisor from connecting a
virtual keyboard and typing init=/bin/bash, something needs to provision
an SSH key, etc.)

In my opinion, it is not so great to identify bugs in the driver and
then say that they're only being fixed for TDX and SEV.

Keep in mind that, as I understand it, there is nothing virt specific
about virtio.  There are real physical devices that speak virtio.

[1] The DMA quirk is nasty.  Fortunately, it's the only case I'm aware
of in which the virtio driver genuinely cannot be made secure and
compatible at the smae time.  Also, fortunately, most real deployments
except on powerpc work just fine with the DMA quirk unquirked.

> 
> 
>>
>>>
>>> TDX and SEV use the arch hook to enforce DMA API, so that part is also
>>> solved.
>>>
>> Can you point me to the code you’re referring to?
> 
> See 4/8 in this patch kit. It uses an existing hook which is already
> used in tree by s390.

This one:

int arch_has_restricted_virtio_memory_access(void)
+{
+	return is_tdx_guest();
+}

I'm looking at a fairly recent kernel, and I don't see anything for s390
wired up in vring_use_dma_api.

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-04  1:46                 ` Andy Lutomirski
  0 siblings, 0 replies; 116+ messages in thread
From: Andy Lutomirski @ 2021-06-04  1:46 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: sathyanarayanan.kuppuswamy, the arch/x86 maintainers,
	Linux Kernel Mailing List, virtualization, iommu, Josh Poimboeuf,
	robin.murphy, hch, m.szyprowski

On 6/3/21 4:32 PM, Andi Kleen wrote:
> 
>> We do not need an increasing pile of kludges
> 
> Do you mean disabling features is a kludge?
> 
> If yes I disagree with that characterization.
> 
> 
>> to make TDX and SEV “secure”.  We need the actual loaded driver to be
>> secure.  The virtio architecture is full of legacy nonsense,
>> and there is no good reason for SEV and TDX to be a giant special case.
> 
> I don't know where you see a "giant special case". Except for the
> limited feature negotiation all the changes are common, and the
> disabling of features (which is not new BTW, but already done e.g. with
> forcing DMA API in some cases) can be of course used by all these other
> technologies too. But it just cannot be done by default for everything
> because it would break compatibility. So every technology with such
> requirements has to explicitly opt-in.
> 
> 
>>
>> As I said before, real PCIe (Thunderbolt/USB-C or anything else) has
>> the exact same problem.  The fact that TDX has encrypted memory is, at
>> best, a poor proxy for the actual condition.  The actual condition is
>> that the host does not trust the device to implement the virtio
>> protocol correctly.
> 
> Right they can do similar limitations of feature sets. But again it
> cannot be default.

Let me try again.

For most Linux drivers, a report that a misbehaving device can corrupt
host memory is a bug, not a feature.  If a USB device can corrupt kernel
memory, that's a serious bug.  If a USB-C device can corrupt kernel
memory, that's also a serious bug, although, sadly, we probably have
lots of these bugs.  If a Firewire device can corrupt kernel memory,
news at 11.  If a Bluetooth or WiFi peer can corrupt kernel memory,
people write sonnets about it and give it clever names.  Why is virtio
special?

If, for some reason, the virtio driver cannot be fixed so that it is
secure and compatible [1], then I think that the limited cases that are
secure should be accessible to anyone, with or without TDX.  Have a
virtio.secure_mode module option or a udev-controllable parameter or an
alternative driver name or *something*.  An alternative driver name
would allow userspace to prevent the insecure mode from auto-binding to
devices.  And make whatever system configures encrypted guests for
security use this mode.  (Linux is not going to be magically secure just
by booting it in TDX.  There's a whole process of unsealing or remote
attestation, something needs to prevent the hypervisor from connecting a
virtual keyboard and typing init=/bin/bash, something needs to provision
an SSH key, etc.)

In my opinion, it is not so great to identify bugs in the driver and
then say that they're only being fixed for TDX and SEV.

Keep in mind that, as I understand it, there is nothing virt specific
about virtio.  There are real physical devices that speak virtio.

[1] The DMA quirk is nasty.  Fortunately, it's the only case I'm aware
of in which the virtio driver genuinely cannot be made secure and
compatible at the smae time.  Also, fortunately, most real deployments
except on powerpc work just fine with the DMA quirk unquirked.

> 
> 
>>
>>>
>>> TDX and SEV use the arch hook to enforce DMA API, so that part is also
>>> solved.
>>>
>> Can you point me to the code you’re referring to?
> 
> See 4/8 in this patch kit. It uses an existing hook which is already
> used in tree by s390.

This one:

int arch_has_restricted_virtio_memory_access(void)
+{
+	return is_tdx_guest();
+}

I'm looking at a fairly recent kernel, and I don't see anything for s390
wired up in vring_use_dma_api.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
  2021-06-04  1:46                 ` Andy Lutomirski
  (?)
@ 2021-06-04  1:54                   ` Andi Kleen
  -1 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-04  1:54 UTC (permalink / raw)
  To: Andy Lutomirski, mst
  Cc: Jason Wang, virtualization, hch, m.szyprowski, robin.murphy,
	iommu, the arch/x86 maintainers, sathyanarayanan.kuppuswamy,
	Josh Poimboeuf, Linux Kernel Mailing List


> For most Linux drivers, a report that a misbehaving device can corrupt
> host memory is a bug, not a feature.  If a USB device can corrupt kernel
> memory, that's a serious bug.  If a USB-C device can corrupt kernel
> memory, that's also a serious bug, although, sadly, we probably have
> lots of these bugs.  If a Firewire device can corrupt kernel memory,
> news at 11.  If a Bluetooth or WiFi peer can corrupt kernel memory,
> people write sonnets about it and give it clever names.  Why is virtio
> special?

Well for most cases it's pointless because they don't have any memory 
protection anyways.

Why break compatibility if it does not buy you anything?

Anyways if you want to enable the restricted mode for something else, 
it's easy to do. The cases where it matters seem to already work on it, 
like the user space virtio ring.

My changes for boundary checking are enabled unconditionally anyways, as 
well as the other patchkits.


>
> This one:
>
> int arch_has_restricted_virtio_memory_access(void)
> +{
> +	return is_tdx_guest();
> +}
>
> I'm looking at a fairly recent kernel, and I don't see anything for s390
> wired up in vring_use_dma_api.

It's not using vring_use_dma_api, but enforces the DMA API at virtio 
ring setup time, same as SEV/TDX.

-Andi


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-04  1:54                   ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-04  1:54 UTC (permalink / raw)
  To: Andy Lutomirski, mst
  Cc: Jason Wang, the arch/x86 maintainers, Linux Kernel Mailing List,
	virtualization, iommu, Josh Poimboeuf, robin.murphy, hch


> For most Linux drivers, a report that a misbehaving device can corrupt
> host memory is a bug, not a feature.  If a USB device can corrupt kernel
> memory, that's a serious bug.  If a USB-C device can corrupt kernel
> memory, that's also a serious bug, although, sadly, we probably have
> lots of these bugs.  If a Firewire device can corrupt kernel memory,
> news at 11.  If a Bluetooth or WiFi peer can corrupt kernel memory,
> people write sonnets about it and give it clever names.  Why is virtio
> special?

Well for most cases it's pointless because they don't have any memory 
protection anyways.

Why break compatibility if it does not buy you anything?

Anyways if you want to enable the restricted mode for something else, 
it's easy to do. The cases where it matters seem to already work on it, 
like the user space virtio ring.

My changes for boundary checking are enabled unconditionally anyways, as 
well as the other patchkits.


>
> This one:
>
> int arch_has_restricted_virtio_memory_access(void)
> +{
> +	return is_tdx_guest();
> +}
>
> I'm looking at a fairly recent kernel, and I don't see anything for s390
> wired up in vring_use_dma_api.

It's not using vring_use_dma_api, but enforces the DMA API at virtio 
ring setup time, same as SEV/TDX.

-Andi

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-04  1:54                   ` Andi Kleen
  0 siblings, 0 replies; 116+ messages in thread
From: Andi Kleen @ 2021-06-04  1:54 UTC (permalink / raw)
  To: Andy Lutomirski, mst
  Cc: sathyanarayanan.kuppuswamy, the arch/x86 maintainers,
	Linux Kernel Mailing List, virtualization, iommu, Josh Poimboeuf,
	robin.murphy, hch, m.szyprowski


> For most Linux drivers, a report that a misbehaving device can corrupt
> host memory is a bug, not a feature.  If a USB device can corrupt kernel
> memory, that's a serious bug.  If a USB-C device can corrupt kernel
> memory, that's also a serious bug, although, sadly, we probably have
> lots of these bugs.  If a Firewire device can corrupt kernel memory,
> news at 11.  If a Bluetooth or WiFi peer can corrupt kernel memory,
> people write sonnets about it and give it clever names.  Why is virtio
> special?

Well for most cases it's pointless because they don't have any memory 
protection anyways.

Why break compatibility if it does not buy you anything?

Anyways if you want to enable the restricted mode for something else, 
it's easy to do. The cases where it matters seem to already work on it, 
like the user space virtio ring.

My changes for boundary checking are enabled unconditionally anyways, as 
well as the other patchkits.


>
> This one:
>
> int arch_has_restricted_virtio_memory_access(void)
> +{
> +	return is_tdx_guest();
> +}
>
> I'm looking at a fairly recent kernel, and I don't see anything for s390
> wired up in vring_use_dma_api.

It's not using vring_use_dma_api, but enforces the DMA API at virtio 
ring setup time, same as SEV/TDX.

-Andi

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
  2021-06-03 17:33     ` Andy Lutomirski
  (?)
@ 2021-06-04  2:20       ` Jason Wang
  -1 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-04  2:20 UTC (permalink / raw)
  To: Andy Lutomirski, Andi Kleen, mst
  Cc: virtualization, hch, m.szyprowski, robin.murphy, iommu, x86,
	sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel


在 2021/6/4 上午1:33, Andy Lutomirski 写道:
> On 6/2/21 5:41 PM, Andi Kleen wrote:
>> Only allow split mode when in a protected guest. Followon
>> patches harden the split mode code paths, and we don't want
>> an malicious host to force anything else. Also disallow
>> indirect mode for similar reasons.
> I read this as "the virtio driver is buggy.  Let's disable most of the
> buggy code in one special case in which we need a driver without bugs.
> In all the other cases (e.g. hardware virtio device connected over
> USB-C), driver bugs are still allowed."
>
> Can we just fix the driver without special cases?


I think we can, this is what this series tries to do:

https://www.spinics.net/lists/kvm/msg241825.html

It tries to fix without a special caring for any specific features.

Thanks



>
> --Andy
>


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-04  2:20       ` Jason Wang
  0 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-04  2:20 UTC (permalink / raw)
  To: Andy Lutomirski, Andi Kleen, mst
  Cc: x86, linux-kernel, virtualization, iommu, jpoimboe, robin.murphy, hch


在 2021/6/4 上午1:33, Andy Lutomirski 写道:
> On 6/2/21 5:41 PM, Andi Kleen wrote:
>> Only allow split mode when in a protected guest. Followon
>> patches harden the split mode code paths, and we don't want
>> an malicious host to force anything else. Also disallow
>> indirect mode for similar reasons.
> I read this as "the virtio driver is buggy.  Let's disable most of the
> buggy code in one special case in which we need a driver without bugs.
> In all the other cases (e.g. hardware virtio device connected over
> USB-C), driver bugs are still allowed."
>
> Can we just fix the driver without special cases?


I think we can, this is what this series tries to do:

https://www.spinics.net/lists/kvm/msg241825.html

It tries to fix without a special caring for any specific features.

Thanks



>
> --Andy
>

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-04  2:20       ` Jason Wang
  0 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-04  2:20 UTC (permalink / raw)
  To: Andy Lutomirski, Andi Kleen, mst
  Cc: sathyanarayanan.kuppuswamy, x86, linux-kernel, virtualization,
	iommu, jpoimboe, robin.murphy, hch, m.szyprowski


在 2021/6/4 上午1:33, Andy Lutomirski 写道:
> On 6/2/21 5:41 PM, Andi Kleen wrote:
>> Only allow split mode when in a protected guest. Followon
>> patches harden the split mode code paths, and we don't want
>> an malicious host to force anything else. Also disallow
>> indirect mode for similar reasons.
> I read this as "the virtio driver is buggy.  Let's disable most of the
> buggy code in one special case in which we need a driver without bugs.
> In all the other cases (e.g. hardware virtio device connected over
> USB-C), driver bugs are still allowed."
>
> Can we just fix the driver without special cases?


I think we can, this is what this series tries to do:

https://www.spinics.net/lists/kvm/msg241825.html

It tries to fix without a special caring for any specific features.

Thanks



>
> --Andy
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
  2021-06-03 13:55               ` Andi Kleen
  (?)
@ 2021-06-04  2:29                 ` Jason Wang
  -1 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-04  2:29 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: virtualization, hch, m.szyprowski, robin.murphy, iommu, x86,
	sathyanarayanan.kuppuswamy, jpoimboe, linux-kernel


在 2021/6/3 下午9:55, Andi Kleen 写道:
>
>> Ok, but what I meant is this, if we don't read from the descriptor 
>> ring, and validate all the other metadata supplied by the device 
>> (used id and len). Then there should be no way for the device to 
>> suppress the dma flags to write to the indirect descriptor table.
>>
>> Or do you have an example how it can do that?
>
> I don't. If you can validate everything it's probably ok
>
> The only drawback is even more code to audit and test.
>
> -Andi
>
>

Ok, then I'm going to post a formal series, please have a look and we 
can start from there.

Thanks


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-04  2:29                 ` Jason Wang
  0 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-04  2:29 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: x86, linux-kernel, virtualization, iommu, jpoimboe, robin.murphy, hch


在 2021/6/3 下午9:55, Andi Kleen 写道:
>
>> Ok, but what I meant is this, if we don't read from the descriptor 
>> ring, and validate all the other metadata supplied by the device 
>> (used id and len). Then there should be no way for the device to 
>> suppress the dma flags to write to the indirect descriptor table.
>>
>> Or do you have an example how it can do that?
>
> I don't. If you can validate everything it's probably ok
>
> The only drawback is even more code to audit and test.
>
> -Andi
>
>

Ok, then I'm going to post a formal series, please have a look and we 
can start from there.

Thanks

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v1 1/8] virtio: Force only split mode with protected guest
@ 2021-06-04  2:29                 ` Jason Wang
  0 siblings, 0 replies; 116+ messages in thread
From: Jason Wang @ 2021-06-04  2:29 UTC (permalink / raw)
  To: Andi Kleen, mst
  Cc: sathyanarayanan.kuppuswamy, x86, linux-kernel, virtualization,
	iommu, jpoimboe, robin.murphy, hch, m.szyprowski


在 2021/6/3 下午9:55, Andi Kleen 写道:
>
>> Ok, but what I meant is this, if we don't read from the descriptor 
>> ring, and validate all the other metadata supplied by the device 
>> (used id and len). Then there should be no way for the device to 
>> suppress the dma flags to write to the indirect descriptor table.
>>
>> Or do you have an example how it can do that?
>
> I don't. If you can validate everything it's probably ok
>
> The only drawback is even more code to audit and test.
>
> -Andi
>
>

Ok, then I'm going to post a formal series, please have a look and we 
can start from there.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 116+ messages in thread

end of thread, other threads:[~2021-06-04  2:30 UTC | newest]

Thread overview: 116+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-03  0:41 Virtio hardening for TDX Andi Kleen
2021-06-03  0:41 ` Andi Kleen
2021-06-03  0:41 ` Andi Kleen
2021-06-03  0:41 ` [PATCH v1 1/8] virtio: Force only split mode with protected guest Andi Kleen
2021-06-03  0:41   ` Andi Kleen
2021-06-03  0:41   ` Andi Kleen
2021-06-03  1:36   ` Jason Wang
2021-06-03  1:36     ` Jason Wang
2021-06-03  1:36     ` Jason Wang
2021-06-03  1:48     ` Andi Kleen
2021-06-03  1:48       ` Andi Kleen
2021-06-03  1:48       ` Andi Kleen
2021-06-03  2:32       ` Jason Wang
2021-06-03  2:32         ` Jason Wang
2021-06-03  2:32         ` Jason Wang
2021-06-03  2:56         ` Andi Kleen
2021-06-03  2:56           ` Andi Kleen
2021-06-03  2:56           ` Andi Kleen
2021-06-03  3:02           ` Jason Wang
2021-06-03  3:02             ` Jason Wang
2021-06-03  3:02             ` Jason Wang
2021-06-03 13:55             ` Andi Kleen
2021-06-03 13:55               ` Andi Kleen
2021-06-03 13:55               ` Andi Kleen
2021-06-04  2:29               ` Jason Wang
2021-06-04  2:29                 ` Jason Wang
2021-06-04  2:29                 ` Jason Wang
2021-06-03 17:33   ` Andy Lutomirski
2021-06-03 17:33     ` Andy Lutomirski
2021-06-03 17:33     ` Andy Lutomirski
2021-06-03 18:00     ` Andi Kleen
2021-06-03 18:00       ` Andi Kleen
2021-06-03 18:00       ` Andi Kleen
2021-06-03 19:31       ` Andy Lutomirski
2021-06-03 19:31         ` Andy Lutomirski
2021-06-03 19:31         ` Andy Lutomirski
2021-06-03 19:53         ` Andi Kleen
2021-06-03 19:53           ` Andi Kleen
2021-06-03 19:53           ` Andi Kleen
2021-06-03 22:17           ` Andy Lutomirski
2021-06-03 22:17             ` Andy Lutomirski
2021-06-03 22:17             ` Andy Lutomirski
2021-06-03 23:32             ` Andi Kleen
2021-06-03 23:32               ` Andi Kleen
2021-06-03 23:32               ` Andi Kleen
2021-06-04  1:46               ` Andy Lutomirski
2021-06-04  1:46                 ` Andy Lutomirski
2021-06-04  1:46                 ` Andy Lutomirski
2021-06-04  1:54                 ` Andi Kleen
2021-06-04  1:54                   ` Andi Kleen
2021-06-04  1:54                   ` Andi Kleen
2021-06-04  1:22         ` Jason Wang
2021-06-04  1:22           ` Jason Wang
2021-06-04  1:22           ` Jason Wang
2021-06-04  1:29       ` Jason Wang
2021-06-04  1:29         ` Jason Wang
2021-06-04  1:29         ` Jason Wang
2021-06-04  2:20     ` Jason Wang
2021-06-04  2:20       ` Jason Wang
2021-06-04  2:20       ` Jason Wang
2021-06-03  0:41 ` [PATCH v1 2/8] virtio: Add boundary checks to virtio ring Andi Kleen
2021-06-03  0:41   ` Andi Kleen
2021-06-03  0:41   ` Andi Kleen
2021-06-03  2:14   ` Jason Wang
2021-06-03  2:14     ` Jason Wang
2021-06-03  2:14     ` Jason Wang
2021-06-03  2:18     ` Andi Kleen
2021-06-03  2:18       ` Andi Kleen
2021-06-03  2:18       ` Andi Kleen
2021-06-03  2:36       ` Jason Wang
2021-06-03  2:36         ` Jason Wang
2021-06-03  2:36         ` Jason Wang
2021-06-03  0:41 ` [PATCH v1 3/8] virtio: Harden split buffer detachment Andi Kleen
2021-06-03  0:41   ` Andi Kleen
2021-06-03  0:41   ` Andi Kleen
2021-06-03  2:29   ` Jason Wang
2021-06-03  2:29     ` Jason Wang
2021-06-03  2:29     ` Jason Wang
2021-06-03  0:41 ` [PATCH v1 4/8] x86/tdx: Add arch_has_restricted_memory_access for TDX Andi Kleen
2021-06-03  0:41   ` Andi Kleen
2021-06-03  0:41   ` Andi Kleen
2021-06-03  4:02   ` Kuppuswamy, Sathyanarayanan
2021-06-03  4:02     ` Kuppuswamy, Sathyanarayanan
2021-06-03  0:41 ` [PATCH v1 5/8] dma: Use size for swiotlb boundary checks Andi Kleen
2021-06-03  0:41   ` Andi Kleen
2021-06-03  0:41   ` Andi Kleen
2021-06-03  1:48   ` Konrad Rzeszutek Wilk
2021-06-03  1:48     ` Konrad Rzeszutek Wilk
2021-06-03  1:48     ` Konrad Rzeszutek Wilk
2021-06-03  2:03     ` Andi Kleen
2021-06-03  2:03       ` Andi Kleen
2021-06-03  2:03       ` Andi Kleen
2021-06-03  9:09   ` Robin Murphy
2021-06-03  9:09     ` Robin Murphy
2021-06-03  9:09     ` Robin Murphy
2021-06-03  0:41 ` [PATCH v1 6/8] dma: Add return value to dma_unmap_page Andi Kleen
2021-06-03  0:41   ` Andi Kleen
2021-06-03  0:41   ` Andi Kleen
2021-06-03  9:08   ` Robin Murphy
2021-06-03  9:08     ` Robin Murphy
2021-06-03  9:08     ` Robin Murphy
2021-06-03 12:36     ` Andi Kleen
2021-06-03 12:36       ` Andi Kleen
2021-06-03 12:36       ` Andi Kleen
2021-06-03  0:41 ` [PATCH v1 7/8] virtio: Abort IO when descriptor points outside forced swiotlb Andi Kleen
2021-06-03  0:41   ` Andi Kleen
2021-06-03  0:41   ` Andi Kleen
2021-06-03  0:41 ` [PATCH v1 8/8] virtio: Error out on endless free lists Andi Kleen
2021-06-03  0:41   ` Andi Kleen
2021-06-03  0:41   ` Andi Kleen
2021-06-03  1:34 ` Virtio hardening for TDX Jason Wang
2021-06-03  1:34   ` Jason Wang
2021-06-03  1:34   ` Jason Wang
2021-06-03  1:56   ` Andi Kleen
2021-06-03  1:56     ` Andi Kleen
2021-06-03  1:56     ` Andi Kleen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.