virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH vhost v6 00/11] virtio core prepares for AF_XDP
@ 2023-03-27  4:05 Xuan Zhuo
  2023-03-27  4:05 ` [PATCH vhost v6 01/11] virtio_ring: split: separate dma codes Xuan Zhuo
                   ` (10 more replies)
  0 siblings, 11 replies; 57+ messages in thread
From: Xuan Zhuo @ 2023-03-27  4:05 UTC (permalink / raw)
  To: virtualization; +Cc: Michael S. Tsirkin

XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
copy feature of xsk (XDP socket) needs to be supported by the driver. The
performance of zero copy is very good.

ENV: Qemu with vhost.

                   vhost cpu | Guest APP CPU |Guest Softirq CPU | PPS
-----------------------------|---------------|------------------|------------
xmit by sockperf:     90%    |   100%        |                  |  318967
xmit by xsk:          100%   |   30%         |   33%            | 1192064
recv by sockperf:     100%   |   68%         |   100%           |  692288
recv by xsk:          100%   |   33%         |   43%            |  771670

Before achieving the function of Virtio-Net, we also have to let virtio core
support these features:

1. virtio core support premapped
2. virtio core support reset per-queue
3. introduce DMA APIs to virtio core

Please review.

Thanks.

v6:
 1. change the size of the flags to u32.

v5:
 1. fix for error handler
 2. add flags to record internal dma mapping

v4:
 1. rename map_inter to dma_map_internal
 2. fix: Excess function parameter 'vq' description in 'virtqueue_dma_dev'

v3:
 1. add map_inter to struct desc state to reocrd whether virtio core do dma map

v2:
 1. based on sgs[0]->dma_address to judgment is premapped
 2. based on extra.addr to judgment to do unmap for no-indirect desc
 3. based on indir_desc to judgment to do unmap for indirect desc
 4. rename virtqueue_get_dma_dev to virtqueue_dma_dev

v1:
 1. expose dma device. NO introduce the api for dma and sync
 2. split some commit for review.





Xuan Zhuo (11):
  virtio_ring: split: separate dma codes
  virtio_ring: packed: separate dma codes
  virtio_ring: packed-indirect: separate dma codes
  virtio_ring: split: support premapped
  virtio_ring: packed: support premapped
  virtio_ring: packed-indirect: support premapped
  virtio_ring: update document for virtqueue_add_*
  virtio_ring: introduce virtqueue_dma_dev()
  virtio_ring: correct the expression of the description of
    virtqueue_resize()
  virtio_ring: separate the logic of reset/enable from virtqueue_resize
  virtio_ring: introduce virtqueue_reset()

 drivers/virtio/virtio.c      |   6 +
 drivers/virtio/virtio_ring.c | 352 +++++++++++++++++++++++++----------
 include/linux/virtio.h       |   4 +
 3 files changed, 265 insertions(+), 97 deletions(-)

--
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH vhost v6 01/11] virtio_ring: split: separate dma codes
  2023-03-27  4:05 [PATCH vhost v6 00/11] virtio core prepares for AF_XDP Xuan Zhuo
@ 2023-03-27  4:05 ` Xuan Zhuo
  2023-03-28  6:26   ` Jason Wang
  2023-03-27  4:05 ` [PATCH vhost v6 02/11] virtio_ring: packed: " Xuan Zhuo
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 57+ messages in thread
From: Xuan Zhuo @ 2023-03-27  4:05 UTC (permalink / raw)
  To: virtualization; +Cc: Michael S. Tsirkin

DMA-related logic is separated from the virtqueue_add_split() to
one new function. DMA address will be saved as sg->dma_address if
use_dma_api is true, then virtqueue_add_split() will use it directly.
Unmap operation will be simpler.

The purpose of this is to facilitate subsequent support to receive
dma address mapped by drivers.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 122 +++++++++++++++++++++++++++--------
 1 file changed, 94 insertions(+), 28 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 41144b5246a8..2aafb7da793d 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -379,6 +379,14 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
 			    direction);
 }
 
+static dma_addr_t vring_sg_address(struct scatterlist *sg)
+{
+	if (sg->dma_address)
+		return sg->dma_address;
+
+	return (dma_addr_t)sg_phys(sg);
+}
+
 static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
 				   void *cpu_addr, size_t size,
 				   enum dma_data_direction direction)
@@ -520,6 +528,80 @@ static inline unsigned int virtqueue_add_desc_split(struct virtqueue *vq,
 	return next;
 }
 
+static void virtqueue_unmap_sgs(struct vring_virtqueue *vq,
+				struct scatterlist *sgs[],
+				unsigned int total_sg,
+				unsigned int out_sgs,
+				unsigned int in_sgs)
+{
+	struct scatterlist *sg;
+	unsigned int n;
+
+	if (!vq->use_dma_api)
+		return;
+
+	for (n = 0; n < out_sgs; n++) {
+		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+			if (!sg->dma_address)
+				return;
+
+			dma_unmap_page(vring_dma_dev(vq), sg->dma_address,
+				       sg->length, DMA_TO_DEVICE);
+		}
+	}
+
+	for (; n < (out_sgs + in_sgs); n++) {
+		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+			if (!sg->dma_address)
+				return;
+
+			dma_unmap_page(vring_dma_dev(vq), sg->dma_address,
+				       sg->length, DMA_FROM_DEVICE);
+		}
+	}
+}
+
+static int virtqueue_map_sgs(struct vring_virtqueue *vq,
+			     struct scatterlist *sgs[],
+			     unsigned int total_sg,
+			     unsigned int out_sgs,
+			     unsigned int in_sgs)
+{
+	struct scatterlist *sg;
+	unsigned int n;
+
+	if (!vq->use_dma_api)
+		return 0;
+
+	for (n = 0; n < out_sgs; n++) {
+		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
+
+			if (vring_mapping_error(vq, addr))
+				goto err;
+
+			sg->dma_address = addr;
+		}
+	}
+
+	for (; n < (out_sgs + in_sgs); n++) {
+		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
+
+			if (vring_mapping_error(vq, addr))
+				goto err;
+
+			sg->dma_address = addr;
+		}
+	}
+
+	return 0;
+
+err:
+	virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
+	return -ENOMEM;
+}
+
 static inline int virtqueue_add_split(struct virtqueue *_vq,
 				      struct scatterlist *sgs[],
 				      unsigned int total_sg,
@@ -532,9 +614,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	struct scatterlist *sg;
 	struct vring_desc *desc;
-	unsigned int i, n, avail, descs_used, prev, err_idx;
-	int head;
+	unsigned int i, n, avail, descs_used, prev;
 	bool indirect;
+	int head;
 
 	START_USE(vq);
 
@@ -586,32 +668,30 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 		return -ENOSPC;
 	}
 
+	if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
+		goto err_map;
+
 	for (n = 0; n < out_sgs; n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
-			if (vring_mapping_error(vq, addr))
-				goto unmap_release;
-
 			prev = i;
 			/* Note that we trust indirect descriptor
 			 * table since it use stream DMA mapping.
 			 */
-			i = virtqueue_add_desc_split(_vq, desc, i, addr, sg->length,
+			i = virtqueue_add_desc_split(_vq, desc, i,
+						     vring_sg_address(sg),
+						     sg->length,
 						     VRING_DESC_F_NEXT,
 						     indirect);
 		}
 	}
 	for (; n < (out_sgs + in_sgs); n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
-			if (vring_mapping_error(vq, addr))
-				goto unmap_release;
-
 			prev = i;
 			/* Note that we trust indirect descriptor
 			 * table since it use stream DMA mapping.
 			 */
-			i = virtqueue_add_desc_split(_vq, desc, i, addr,
+			i = virtqueue_add_desc_split(_vq, desc, i,
+						     vring_sg_address(sg),
 						     sg->length,
 						     VRING_DESC_F_NEXT |
 						     VRING_DESC_F_WRITE,
@@ -679,23 +759,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	return 0;
 
 unmap_release:
-	err_idx = i;
-
-	if (indirect)
-		i = 0;
-	else
-		i = head;
-
-	for (n = 0; n < total_sg; n++) {
-		if (i == err_idx)
-			break;
-		if (indirect) {
-			vring_unmap_one_split_indirect(vq, &desc[i]);
-			i = virtio16_to_cpu(_vq->vdev, desc[i].next);
-		} else
-			i = vring_unmap_one_split(vq, i);
-	}
+	virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
 
+err_map:
 	if (indirect)
 		kfree(desc);
 
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH vhost v6 02/11] virtio_ring: packed: separate dma codes
  2023-03-27  4:05 [PATCH vhost v6 00/11] virtio core prepares for AF_XDP Xuan Zhuo
  2023-03-27  4:05 ` [PATCH vhost v6 01/11] virtio_ring: split: separate dma codes Xuan Zhuo
@ 2023-03-27  4:05 ` Xuan Zhuo
  2023-03-27  4:05 ` [PATCH vhost v6 03/11] virtio_ring: packed-indirect: " Xuan Zhuo
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 57+ messages in thread
From: Xuan Zhuo @ 2023-03-27  4:05 UTC (permalink / raw)
  To: virtualization; +Cc: Michael S. Tsirkin

DMA-related logic is separated from the virtqueue_add_packed(). DMA
address will be saved as sg->dma_address, then virtqueue_add_packed()
will use it directly. Unmap operation will be simpler.

The purpose of this is to facilitate subsequent support to receive
dma address mapped by drivers.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 39 +++++++++---------------------------
 1 file changed, 9 insertions(+), 30 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 2aafb7da793d..b1bf7266daa0 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1431,9 +1431,9 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	struct vring_packed_desc *desc;
 	struct scatterlist *sg;
-	unsigned int i, n, c, descs_used, err_idx;
+	unsigned int i, n, c, descs_used;
 	__le16 head_flags, flags;
-	u16 head, id, prev, curr, avail_used_flags;
+	u16 head, id, prev, curr;
 	int err;
 
 	START_USE(vq);
@@ -1462,7 +1462,6 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 	}
 
 	head = vq->packed.next_avail_idx;
-	avail_used_flags = vq->packed.avail_used_flags;
 
 	WARN_ON_ONCE(total_sg > vq->packed.vring.num && !vq->indirect);
 
@@ -1480,15 +1479,15 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 	id = vq->free_head;
 	BUG_ON(id == vq->packed.vring.num);
 
+	if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs)) {
+		END_USE(vq);
+		return -EIO;
+	}
+
 	curr = id;
 	c = 0;
 	for (n = 0; n < out_sgs + in_sgs; n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
-					DMA_TO_DEVICE : DMA_FROM_DEVICE);
-			if (vring_mapping_error(vq, addr))
-				goto unmap_release;
-
 			flags = cpu_to_le16(vq->packed.avail_used_flags |
 				    (++c == total_sg ? 0 : VRING_DESC_F_NEXT) |
 				    (n < out_sgs ? 0 : VRING_DESC_F_WRITE));
@@ -1497,12 +1496,12 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 			else
 				desc[i].flags = flags;
 
-			desc[i].addr = cpu_to_le64(addr);
+			desc[i].addr = cpu_to_le64(vring_sg_address(sg));
 			desc[i].len = cpu_to_le32(sg->length);
 			desc[i].id = cpu_to_le16(id);
 
 			if (unlikely(vq->use_dma_api)) {
-				vq->packed.desc_extra[curr].addr = addr;
+				vq->packed.desc_extra[curr].addr = vring_sg_address(sg);
 				vq->packed.desc_extra[curr].len = sg->length;
 				vq->packed.desc_extra[curr].flags =
 					le16_to_cpu(flags);
@@ -1548,26 +1547,6 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 	END_USE(vq);
 
 	return 0;
-
-unmap_release:
-	err_idx = i;
-	i = head;
-	curr = vq->free_head;
-
-	vq->packed.avail_used_flags = avail_used_flags;
-
-	for (n = 0; n < total_sg; n++) {
-		if (i == err_idx)
-			break;
-		vring_unmap_extra_packed(vq, &vq->packed.desc_extra[curr]);
-		curr = vq->packed.desc_extra[curr].next;
-		i++;
-		if (i >= vq->packed.vring.num)
-			i = 0;
-	}
-
-	END_USE(vq);
-	return -EIO;
 }
 
 static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH vhost v6 03/11] virtio_ring: packed-indirect: separate dma codes
  2023-03-27  4:05 [PATCH vhost v6 00/11] virtio core prepares for AF_XDP Xuan Zhuo
  2023-03-27  4:05 ` [PATCH vhost v6 01/11] virtio_ring: split: separate dma codes Xuan Zhuo
  2023-03-27  4:05 ` [PATCH vhost v6 02/11] virtio_ring: packed: " Xuan Zhuo
@ 2023-03-27  4:05 ` Xuan Zhuo
  2023-03-27  4:05 ` [PATCH vhost v6 04/11] virtio_ring: split: support premapped Xuan Zhuo
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 57+ messages in thread
From: Xuan Zhuo @ 2023-03-27  4:05 UTC (permalink / raw)
  To: virtualization; +Cc: Michael S. Tsirkin

DMA-related logic is separated from the virtqueue_add_indirect_packed().

DMA address will be saved as sg->dma_address, then
virtqueue_add_indirect_packed() will use it directly. Unmap operation
will be simpler.

The purpose of this is to facilitate subsequent support to receive
dma address mapped by drivers.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index b1bf7266daa0..3ada30b475d2 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1315,7 +1315,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 {
 	struct vring_packed_desc *desc;
 	struct scatterlist *sg;
-	unsigned int i, n, err_idx;
+	unsigned int i, n;
 	u16 head, id;
 	dma_addr_t addr;
 
@@ -1335,16 +1335,14 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	id = vq->free_head;
 	BUG_ON(id == vq->packed.vring.num);
 
+	if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
+		goto err_map;
+
 	for (n = 0; n < out_sgs + in_sgs; n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			addr = vring_map_one_sg(vq, sg, n < out_sgs ?
-					DMA_TO_DEVICE : DMA_FROM_DEVICE);
-			if (vring_mapping_error(vq, addr))
-				goto unmap_release;
-
 			desc[i].flags = cpu_to_le16(n < out_sgs ?
 						0 : VRING_DESC_F_WRITE);
-			desc[i].addr = cpu_to_le64(addr);
+			desc[i].addr = cpu_to_le64(vring_sg_address(sg));
 			desc[i].len = cpu_to_le32(sg->length);
 			i++;
 		}
@@ -1408,11 +1406,9 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	return 0;
 
 unmap_release:
-	err_idx = i;
-
-	for (i = 0; i < err_idx; i++)
-		vring_unmap_desc_packed(vq, &desc[i]);
+	virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
 
+err_map:
 	kfree(desc);
 
 	END_USE(vq);
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH vhost v6 04/11] virtio_ring: split: support premapped
  2023-03-27  4:05 [PATCH vhost v6 00/11] virtio core prepares for AF_XDP Xuan Zhuo
                   ` (2 preceding siblings ...)
  2023-03-27  4:05 ` [PATCH vhost v6 03/11] virtio_ring: packed-indirect: " Xuan Zhuo
@ 2023-03-27  4:05 ` Xuan Zhuo
  2023-03-28  6:27   ` Jason Wang
  2023-03-27  4:05 ` [PATCH vhost v6 05/11] virtio_ring: packed: " Xuan Zhuo
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 57+ messages in thread
From: Xuan Zhuo @ 2023-03-27  4:05 UTC (permalink / raw)
  To: virtualization; +Cc: Michael S. Tsirkin

virtio core only supports virtual addresses, dma is completed in virtio
core.

In some scenarios (such as the AF_XDP), the memory is allocated
and DMA mapping is completed in advance, so it is necessary for us to
support passing the DMA address to virtio core.

Drives can use sg->dma_address to pass the mapped dma address to virtio
core. If one sg->dma_address is used then all sgs must use
sg->dma_address, otherwise all must be null when passing it to the APIs
of virtio.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 3ada30b475d2..9d6acd59e3e0 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -67,9 +67,13 @@
 #define LAST_ADD_TIME_INVALID(vq)
 #endif
 
+#define VRING_STATE_F_MAP_INTERNAL BIT(0)
+
 struct vring_desc_state_split {
 	void *data;			/* Data for callback. */
 	struct vring_desc *indir_desc;	/* Indirect descriptor, if any. */
+	u32 flags;			/* State flags. */
+	u32 padding;
 };
 
 struct vring_desc_state_packed {
@@ -448,7 +452,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
 }
 
 static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
-					  unsigned int i)
+					  unsigned int i, bool dma_map_internal)
 {
 	struct vring_desc_extra *extra = vq->split.desc_extra;
 	u16 flags;
@@ -465,6 +469,9 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
 				 (flags & VRING_DESC_F_WRITE) ?
 				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
 	} else {
+		if (!dma_map_internal)
+			goto out;
+
 		dma_unmap_page(vring_dma_dev(vq),
 			       extra[i].addr,
 			       extra[i].len,
@@ -615,7 +622,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	struct scatterlist *sg;
 	struct vring_desc *desc;
 	unsigned int i, n, avail, descs_used, prev;
-	bool indirect;
+	bool indirect, dma_map_internal;
 	int head;
 
 	START_USE(vq);
@@ -668,7 +675,8 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 		return -ENOSPC;
 	}
 
-	if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
+	dma_map_internal = !sgs[0]->dma_address;
+	if (dma_map_internal && virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
 		goto err_map;
 
 	for (n = 0; n < out_sgs; n++) {
@@ -735,6 +743,8 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	else
 		vq->split.desc_state[head].indir_desc = ctx;
 
+	vq->split.desc_state[head].flags = dma_map_internal ? VRING_STATE_F_MAP_INTERNAL : 0;
+
 	/* Put entry in available array (but don't update avail->idx until they
 	 * do sync). */
 	avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
@@ -759,7 +769,8 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	return 0;
 
 unmap_release:
-	virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
+	if (dma_map_internal)
+		virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
 
 err_map:
 	if (indirect)
@@ -805,20 +816,22 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 {
 	unsigned int i, j;
 	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
+	bool dma_map_internal;
 
 	/* Clear data ptr. */
 	vq->split.desc_state[head].data = NULL;
+	dma_map_internal = !!(vq->split.desc_state[head].flags & VRING_STATE_F_MAP_INTERNAL);
 
 	/* Put back on free list: unmap first-level descriptors and find end */
 	i = head;
 
 	while (vq->split.vring.desc[i].flags & nextflag) {
-		vring_unmap_one_split(vq, i);
+		vring_unmap_one_split(vq, i, dma_map_internal);
 		i = vq->split.desc_extra[i].next;
 		vq->vq.num_free++;
 	}
 
-	vring_unmap_one_split(vq, i);
+	vring_unmap_one_split(vq, i, dma_map_internal);
 	vq->split.desc_extra[i].next = vq->free_head;
 	vq->free_head = head;
 
@@ -840,8 +853,10 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 				VRING_DESC_F_INDIRECT));
 		BUG_ON(len == 0 || len % sizeof(struct vring_desc));
 
-		for (j = 0; j < len / sizeof(struct vring_desc); j++)
-			vring_unmap_one_split_indirect(vq, &indir_desc[j]);
+		if (dma_map_internal) {
+			for (j = 0; j < len / sizeof(struct vring_desc); j++)
+				vring_unmap_one_split_indirect(vq, &indir_desc[j]);
+		}
 
 		kfree(indir_desc);
 		vq->split.desc_state[head].indir_desc = NULL;
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH vhost v6 05/11] virtio_ring: packed: support premapped
  2023-03-27  4:05 [PATCH vhost v6 00/11] virtio core prepares for AF_XDP Xuan Zhuo
                   ` (3 preceding siblings ...)
  2023-03-27  4:05 ` [PATCH vhost v6 04/11] virtio_ring: split: support premapped Xuan Zhuo
@ 2023-03-27  4:05 ` Xuan Zhuo
  2023-03-28  6:27   ` Jason Wang
  2023-03-27  4:05 ` [PATCH vhost v6 06/11] virtio_ring: packed-indirect: " Xuan Zhuo
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 57+ messages in thread
From: Xuan Zhuo @ 2023-03-27  4:05 UTC (permalink / raw)
  To: virtualization; +Cc: Michael S. Tsirkin

virtio core only supports virtual addresses, dma is completed in virtio
core.

In some scenarios (such as the AF_XDP), the memory is allocated
and DMA mapping is completed in advance, so it is necessary for us to
support passing the DMA address to virtio core.

Drives can use sg->dma_address to pass the mapped dma address to virtio
core. If one sg->dma_address is used then all sgs must use
sg->dma_address, otherwise all must be null when passing it to the APIs
of virtio.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 9d6acd59e3e0..2afff1dc6c74 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -81,6 +81,7 @@ struct vring_desc_state_packed {
 	struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */
 	u16 num;			/* Descriptor list length. */
 	u16 last;			/* The last desc state in a list. */
+	u32 flags;			/* State flags. */
 };
 
 struct vring_desc_extra {
@@ -1264,7 +1265,8 @@ static inline u16 packed_last_used(u16 last_used_idx)
 }
 
 static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
-				     struct vring_desc_extra *extra)
+				     struct vring_desc_extra *extra,
+				     bool dma_map_internal)
 {
 	u16 flags;
 
@@ -1279,6 +1281,9 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
 				 (flags & VRING_DESC_F_WRITE) ?
 				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
 	} else {
+		if (!dma_map_internal)
+			return;
+
 		dma_unmap_page(vring_dma_dev(vq),
 			       extra->addr, extra->len,
 			       (flags & VRING_DESC_F_WRITE) ?
@@ -1445,6 +1450,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 	unsigned int i, n, c, descs_used;
 	__le16 head_flags, flags;
 	u16 head, id, prev, curr;
+	bool dma_map_internal;
 	int err;
 
 	START_USE(vq);
@@ -1490,7 +1496,8 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 	id = vq->free_head;
 	BUG_ON(id == vq->packed.vring.num);
 
-	if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs)) {
+	dma_map_internal = !sgs[0]->dma_address;
+	if (dma_map_internal && virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs)) {
 		END_USE(vq);
 		return -EIO;
 	}
@@ -1544,6 +1551,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 	vq->packed.desc_state[id].data = data;
 	vq->packed.desc_state[id].indir_desc = ctx;
 	vq->packed.desc_state[id].last = prev;
+	vq->packed.desc_state[id].flags = dma_map_internal ? VRING_STATE_F_MAP_INTERNAL : 0;
 
 	/*
 	 * A driver MUST NOT make the first descriptor in the list
@@ -1615,8 +1623,10 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
 	struct vring_desc_state_packed *state = NULL;
 	struct vring_packed_desc *desc;
 	unsigned int i, curr;
+	bool dma_map_internal;
 
 	state = &vq->packed.desc_state[id];
+	dma_map_internal = !!(state->flags & VRING_STATE_F_MAP_INTERNAL);
 
 	/* Clear data ptr. */
 	state->data = NULL;
@@ -1629,7 +1639,8 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
 		curr = id;
 		for (i = 0; i < state->num; i++) {
 			vring_unmap_extra_packed(vq,
-						 &vq->packed.desc_extra[curr]);
+						 &vq->packed.desc_extra[curr],
+						 dma_map_internal);
 			curr = vq->packed.desc_extra[curr].next;
 		}
 	}
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH vhost v6 06/11] virtio_ring: packed-indirect: support premapped
  2023-03-27  4:05 [PATCH vhost v6 00/11] virtio core prepares for AF_XDP Xuan Zhuo
                   ` (4 preceding siblings ...)
  2023-03-27  4:05 ` [PATCH vhost v6 05/11] virtio_ring: packed: " Xuan Zhuo
@ 2023-03-27  4:05 ` Xuan Zhuo
  2023-03-28  6:29   ` Jason Wang
  2023-03-27  4:05 ` [PATCH vhost v6 07/11] virtio_ring: update document for virtqueue_add_* Xuan Zhuo
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 57+ messages in thread
From: Xuan Zhuo @ 2023-03-27  4:05 UTC (permalink / raw)
  To: virtualization; +Cc: Michael S. Tsirkin

virtio core only supports virtual addresses, dma is completed in virtio
core.

In some scenarios (such as the AF_XDP), the memory is allocated
and DMA mapping is completed in advance, so it is necessary for us to
support passing the DMA address to virtio core.

Drives can use sg->dma_address to pass the mapped dma address to virtio
core. If one sg->dma_address is used then all sgs must use sg->dma_address,
otherwise all dma_address must be null when passing it to the APIs of
virtio.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 2afff1dc6c74..d5dffbe50070 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1338,6 +1338,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	unsigned int i, n;
 	u16 head, id;
 	dma_addr_t addr;
+	bool dma_map_internal;
 
 	head = vq->packed.next_avail_idx;
 	desc = alloc_indirect_packed(total_sg, gfp);
@@ -1355,7 +1356,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	id = vq->free_head;
 	BUG_ON(id == vq->packed.vring.num);
 
-	if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
+	dma_map_internal = !sgs[0]->dma_address;
+	if (dma_map_internal && virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
 		goto err_map;
 
 	for (n = 0; n < out_sgs + in_sgs; n++) {
@@ -1417,6 +1419,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	vq->packed.desc_state[id].data = data;
 	vq->packed.desc_state[id].indir_desc = desc;
 	vq->packed.desc_state[id].last = id;
+	vq->packed.desc_state[id].flags = dma_map_internal ? VRING_STATE_F_MAP_INTERNAL : 0;
+
 
 	vq->num_added += 1;
 
@@ -1426,7 +1430,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	return 0;
 
 unmap_release:
-	virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
+	if (dma_map_internal)
+		virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
 
 err_map:
 	kfree(desc);
@@ -1653,7 +1658,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
 		if (!desc)
 			return;
 
-		if (vq->use_dma_api) {
+		if (vq->use_dma_api && dma_map_internal) {
 			len = vq->packed.desc_extra[id].len;
 			for (i = 0; i < len / sizeof(struct vring_packed_desc);
 					i++)
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH vhost v6 07/11] virtio_ring: update document for virtqueue_add_*
  2023-03-27  4:05 [PATCH vhost v6 00/11] virtio core prepares for AF_XDP Xuan Zhuo
                   ` (5 preceding siblings ...)
  2023-03-27  4:05 ` [PATCH vhost v6 06/11] virtio_ring: packed-indirect: " Xuan Zhuo
@ 2023-03-27  4:05 ` Xuan Zhuo
  2023-03-28  6:30   ` Jason Wang
  2023-03-27  4:05 ` [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev() Xuan Zhuo
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 57+ messages in thread
From: Xuan Zhuo @ 2023-03-27  4:05 UTC (permalink / raw)
  To: virtualization; +Cc: Michael S. Tsirkin

Update the document of virtqueue_add_* series API, allowing the callers to
use sg->dma_address to pass the dma address to Virtio Core.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index d5dffbe50070..0b198ec3ff92 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2202,6 +2202,10 @@ static inline int virtqueue_add(struct virtqueue *_vq,
  * Caller must ensure we don't call this with other virtqueue operations
  * at the same time (except where noted).
  *
+ * If the caller has done dma map then use sg->dma_address to pass dma address.
+ * If one sg->dma_address is used, then all sgs must use sg->dma_address;
+ * otherwise all sg->dma_address must be NULL.
+ *
  * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
  */
 int virtqueue_add_sgs(struct virtqueue *_vq,
@@ -2236,6 +2240,10 @@ EXPORT_SYMBOL_GPL(virtqueue_add_sgs);
  * Caller must ensure we don't call this with other virtqueue operations
  * at the same time (except where noted).
  *
+ * If the caller has done dma map then use sg->dma_address to pass dma address.
+ * If one sg->dma_address is used, then all sgs must use sg->dma_address;
+ * otherwise all sg->dma_address must be NULL.
+ *
  * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
  */
 int virtqueue_add_outbuf(struct virtqueue *vq,
@@ -2258,6 +2266,10 @@ EXPORT_SYMBOL_GPL(virtqueue_add_outbuf);
  * Caller must ensure we don't call this with other virtqueue operations
  * at the same time (except where noted).
  *
+ * If the caller has done dma map then use sg->dma_address to pass dma address.
+ * If one sg->dma_address is used, then all sgs must use sg->dma_address;
+ * otherwise all sg->dma_address must be NULL.
+ *
  * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
  */
 int virtqueue_add_inbuf(struct virtqueue *vq,
@@ -2281,6 +2293,10 @@ EXPORT_SYMBOL_GPL(virtqueue_add_inbuf);
  * Caller must ensure we don't call this with other virtqueue operations
  * at the same time (except where noted).
  *
+ * If the caller has done dma map then use sg->dma_address to pass dma address.
+ * If one sg->dma_address is used, then all sgs must use sg->dma_address;
+ * otherwise all sg->dma_address must be NULL.
+ *
  * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
  */
 int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-03-27  4:05 [PATCH vhost v6 00/11] virtio core prepares for AF_XDP Xuan Zhuo
                   ` (6 preceding siblings ...)
  2023-03-27  4:05 ` [PATCH vhost v6 07/11] virtio_ring: update document for virtqueue_add_* Xuan Zhuo
@ 2023-03-27  4:05 ` Xuan Zhuo
  2023-04-06 17:20   ` Guenter Roeck
  2023-03-27  4:05 ` [PATCH vhost v6 09/11] virtio_ring: correct the expression of the description of virtqueue_resize() Xuan Zhuo
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 57+ messages in thread
From: Xuan Zhuo @ 2023-03-27  4:05 UTC (permalink / raw)
  To: virtualization; +Cc: Michael S. Tsirkin

Added virtqueue_dma_dev() to get DMA device for virtio. Then the
caller can do dma operation in advance. The purpose is to keep memory
mapped across multiple add/get buf operations.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio.c      |  6 ++++++
 drivers/virtio/virtio_ring.c | 17 +++++++++++++++++
 include/linux/virtio.h       |  2 ++
 3 files changed, 25 insertions(+)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 3893dc29eb26..11c5035369e2 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -1,4 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0-only
+#include <linux/dma-mapping.h>
 #include <linux/virtio.h>
 #include <linux/spinlock.h>
 #include <linux/virtio_config.h>
@@ -243,6 +244,11 @@ static int virtio_dev_probe(struct device *_d)
 	u64 driver_features;
 	u64 driver_features_legacy;
 
+	_d->dma_mask = &_d->coherent_dma_mask;
+	err = dma_set_mask_and_coherent(_d, DMA_BIT_MASK(64));
+	if (err)
+		return err;
+
 	/* We have a driver! */
 	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
 
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 0b198ec3ff92..7b538c16c9b5 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2309,6 +2309,23 @@ int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
 }
 EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
 
+/**
+ * virtqueue_dma_dev - get the dma dev
+ * @_vq: the struct virtqueue we're talking about.
+ *
+ * Returns the dma dev. That can been used for dma api.
+ */
+struct device *virtqueue_dma_dev(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	if (vq->use_dma_api)
+		return vring_dma_dev(vq);
+	else
+		return &vq->vq.vdev->dev;
+}
+EXPORT_SYMBOL_GPL(virtqueue_dma_dev);
+
 /**
  * virtqueue_kick_prepare - first half of split virtqueue_kick call.
  * @_vq: the struct virtqueue
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 2b472514c49b..1fa50191cf0a 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -61,6 +61,8 @@ int virtqueue_add_sgs(struct virtqueue *vq,
 		      void *data,
 		      gfp_t gfp);
 
+struct device *virtqueue_dma_dev(struct virtqueue *vq);
+
 bool virtqueue_kick(struct virtqueue *vq);
 
 bool virtqueue_kick_prepare(struct virtqueue *vq);
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH vhost v6 09/11] virtio_ring: correct the expression of the description of virtqueue_resize()
  2023-03-27  4:05 [PATCH vhost v6 00/11] virtio core prepares for AF_XDP Xuan Zhuo
                   ` (7 preceding siblings ...)
  2023-03-27  4:05 ` [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev() Xuan Zhuo
@ 2023-03-27  4:05 ` Xuan Zhuo
  2023-03-27  4:05 ` [PATCH vhost v6 10/11] virtio_ring: separate the logic of reset/enable from virtqueue_resize Xuan Zhuo
  2023-03-27  4:05 ` [PATCH vhost v6 11/11] virtio_ring: introduce virtqueue_reset() Xuan Zhuo
  10 siblings, 0 replies; 57+ messages in thread
From: Xuan Zhuo @ 2023-03-27  4:05 UTC (permalink / raw)
  To: virtualization; +Cc: Michael S. Tsirkin

Modify the "useless" to a more accurate "unused".

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 7b538c16c9b5..46516d86aa0a 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2714,7 +2714,7 @@ EXPORT_SYMBOL_GPL(vring_create_virtqueue_dma);
  * virtqueue_resize - resize the vring of vq
  * @_vq: the struct virtqueue we're talking about.
  * @num: new ring num
- * @recycle: callback for recycle the useless buffer
+ * @recycle: callback to recycle unused buffers
  *
  * When it is really necessary to create a new vring, it will set the current vq
  * into the reset state. Then call the passed callback to recycle the buffer
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH vhost v6 10/11] virtio_ring: separate the logic of reset/enable from virtqueue_resize
  2023-03-27  4:05 [PATCH vhost v6 00/11] virtio core prepares for AF_XDP Xuan Zhuo
                   ` (8 preceding siblings ...)
  2023-03-27  4:05 ` [PATCH vhost v6 09/11] virtio_ring: correct the expression of the description of virtqueue_resize() Xuan Zhuo
@ 2023-03-27  4:05 ` Xuan Zhuo
  2023-03-27  4:05 ` [PATCH vhost v6 11/11] virtio_ring: introduce virtqueue_reset() Xuan Zhuo
  10 siblings, 0 replies; 57+ messages in thread
From: Xuan Zhuo @ 2023-03-27  4:05 UTC (permalink / raw)
  To: virtualization; +Cc: Michael S. Tsirkin

The subsequent reset function will reuse these logic.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 58 ++++++++++++++++++++++++------------
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 46516d86aa0a..6b5d29fbb83d 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2168,6 +2168,43 @@ static int virtqueue_resize_packed(struct virtqueue *_vq, u32 num)
 	return -ENOMEM;
 }
 
+static int virtqueue_disable_and_recycle(struct virtqueue *_vq,
+					 void (*recycle)(struct virtqueue *vq, void *buf))
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	struct virtio_device *vdev = vq->vq.vdev;
+	void *buf;
+	int err;
+
+	if (!vq->we_own_ring)
+		return -EPERM;
+
+	if (!vdev->config->disable_vq_and_reset)
+		return -ENOENT;
+
+	if (!vdev->config->enable_vq_after_reset)
+		return -ENOENT;
+
+	err = vdev->config->disable_vq_and_reset(_vq);
+	if (err)
+		return err;
+
+	while ((buf = virtqueue_detach_unused_buf(_vq)) != NULL)
+		recycle(_vq, buf);
+
+	return 0;
+}
+
+static int virtqueue_enable_after_reset(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	struct virtio_device *vdev = vq->vq.vdev;
+
+	if (vdev->config->enable_vq_after_reset(_vq))
+		return -EBUSY;
+
+	return 0;
+}
 
 /*
  * Generic functions and exported symbols.
@@ -2738,13 +2775,8 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
 		     void (*recycle)(struct virtqueue *vq, void *buf))
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
-	struct virtio_device *vdev = vq->vq.vdev;
-	void *buf;
 	int err;
 
-	if (!vq->we_own_ring)
-		return -EPERM;
-
 	if (num > vq->vq.num_max)
 		return -E2BIG;
 
@@ -2754,28 +2786,16 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
 	if ((vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num) == num)
 		return 0;
 
-	if (!vdev->config->disable_vq_and_reset)
-		return -ENOENT;
-
-	if (!vdev->config->enable_vq_after_reset)
-		return -ENOENT;
-
-	err = vdev->config->disable_vq_and_reset(_vq);
+	err = virtqueue_disable_and_recycle(_vq, recycle);
 	if (err)
 		return err;
 
-	while ((buf = virtqueue_detach_unused_buf(_vq)) != NULL)
-		recycle(_vq, buf);
-
 	if (vq->packed_ring)
 		err = virtqueue_resize_packed(_vq, num);
 	else
 		err = virtqueue_resize_split(_vq, num);
 
-	if (vdev->config->enable_vq_after_reset(_vq))
-		return -EBUSY;
-
-	return err;
+	return virtqueue_enable_after_reset(_vq);
 }
 EXPORT_SYMBOL_GPL(virtqueue_resize);
 
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH vhost v6 11/11] virtio_ring: introduce virtqueue_reset()
  2023-03-27  4:05 [PATCH vhost v6 00/11] virtio core prepares for AF_XDP Xuan Zhuo
                   ` (9 preceding siblings ...)
  2023-03-27  4:05 ` [PATCH vhost v6 10/11] virtio_ring: separate the logic of reset/enable from virtqueue_resize Xuan Zhuo
@ 2023-03-27  4:05 ` Xuan Zhuo
  10 siblings, 0 replies; 57+ messages in thread
From: Xuan Zhuo @ 2023-03-27  4:05 UTC (permalink / raw)
  To: virtualization; +Cc: Michael S. Tsirkin

Introduce virtqueue_reset() to release all buffer inside vq.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 33 +++++++++++++++++++++++++++++++++
 include/linux/virtio.h       |  2 ++
 2 files changed, 35 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 6b5d29fbb83d..e4cffd63fd06 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2799,6 +2799,39 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
 }
 EXPORT_SYMBOL_GPL(virtqueue_resize);
 
+/**
+ * virtqueue_reset - detach and recycle all unused buffers
+ * @_vq: the struct virtqueue we're talking about.
+ * @recycle: callback to recycle unused buffers
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ *
+ * Returns zero or a negative error.
+ * 0: success.
+ * -EBUSY: Failed to sync with device, vq may not work properly
+ * -ENOENT: Transport or device not supported
+ * -EPERM: Operation not permitted
+ */
+int virtqueue_reset(struct virtqueue *_vq,
+		    void (*recycle)(struct virtqueue *vq, void *buf))
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	int err;
+
+	err = virtqueue_disable_and_recycle(_vq, recycle);
+	if (err)
+		return err;
+
+	if (vq->packed_ring)
+		virtqueue_reinit_packed(vq);
+	else
+		virtqueue_reinit_split(vq);
+
+	return virtqueue_enable_after_reset(_vq);
+}
+EXPORT_SYMBOL_GPL(virtqueue_reset);
+
 /* Only available for split ring */
 struct virtqueue *vring_new_virtqueue(unsigned int index,
 				      unsigned int num,
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 1fa50191cf0a..22bbd06ef8c8 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -97,6 +97,8 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *vq);
 
 int virtqueue_resize(struct virtqueue *vq, u32 num,
 		     void (*recycle)(struct virtqueue *vq, void *buf));
+int virtqueue_reset(struct virtqueue *vq,
+		    void (*recycle)(struct virtqueue *vq, void *buf));
 
 /**
  * struct virtio_device - representation of a device using virtio
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 01/11] virtio_ring: split: separate dma codes
  2023-03-27  4:05 ` [PATCH vhost v6 01/11] virtio_ring: split: separate dma codes Xuan Zhuo
@ 2023-03-28  6:26   ` Jason Wang
  0 siblings, 0 replies; 57+ messages in thread
From: Jason Wang @ 2023-03-28  6:26 UTC (permalink / raw)
  To: Xuan Zhuo; +Cc: Michael S. Tsirkin, virtualization

On Mon, Mar 27, 2023 at 12:05 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> DMA-related logic is separated from the virtqueue_add_split() to
> one new function. DMA address will be saved as sg->dma_address if
> use_dma_api is true, then virtqueue_add_split() will use it directly.
> Unmap operation will be simpler.
>
> The purpose of this is to facilitate subsequent support to receive
> dma address mapped by drivers.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks

> ---
>  drivers/virtio/virtio_ring.c | 122 +++++++++++++++++++++++++++--------
>  1 file changed, 94 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 41144b5246a8..2aafb7da793d 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -379,6 +379,14 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
>                             direction);
>  }
>
> +static dma_addr_t vring_sg_address(struct scatterlist *sg)
> +{
> +       if (sg->dma_address)
> +               return sg->dma_address;
> +
> +       return (dma_addr_t)sg_phys(sg);
> +}
> +
>  static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
>                                    void *cpu_addr, size_t size,
>                                    enum dma_data_direction direction)
> @@ -520,6 +528,80 @@ static inline unsigned int virtqueue_add_desc_split(struct virtqueue *vq,
>         return next;
>  }
>
> +static void virtqueue_unmap_sgs(struct vring_virtqueue *vq,
> +                               struct scatterlist *sgs[],
> +                               unsigned int total_sg,
> +                               unsigned int out_sgs,
> +                               unsigned int in_sgs)
> +{
> +       struct scatterlist *sg;
> +       unsigned int n;
> +
> +       if (!vq->use_dma_api)
> +               return;
> +
> +       for (n = 0; n < out_sgs; n++) {
> +               for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> +                       if (!sg->dma_address)
> +                               return;
> +
> +                       dma_unmap_page(vring_dma_dev(vq), sg->dma_address,
> +                                      sg->length, DMA_TO_DEVICE);
> +               }
> +       }
> +
> +       for (; n < (out_sgs + in_sgs); n++) {
> +               for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> +                       if (!sg->dma_address)
> +                               return;
> +
> +                       dma_unmap_page(vring_dma_dev(vq), sg->dma_address,
> +                                      sg->length, DMA_FROM_DEVICE);
> +               }
> +       }
> +}
> +
> +static int virtqueue_map_sgs(struct vring_virtqueue *vq,
> +                            struct scatterlist *sgs[],
> +                            unsigned int total_sg,
> +                            unsigned int out_sgs,
> +                            unsigned int in_sgs)
> +{
> +       struct scatterlist *sg;
> +       unsigned int n;
> +
> +       if (!vq->use_dma_api)
> +               return 0;
> +
> +       for (n = 0; n < out_sgs; n++) {
> +               for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> +                       dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
> +
> +                       if (vring_mapping_error(vq, addr))
> +                               goto err;
> +
> +                       sg->dma_address = addr;
> +               }
> +       }
> +
> +       for (; n < (out_sgs + in_sgs); n++) {
> +               for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> +                       dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
> +
> +                       if (vring_mapping_error(vq, addr))
> +                               goto err;
> +
> +                       sg->dma_address = addr;
> +               }
> +       }
> +
> +       return 0;
> +
> +err:
> +       virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
> +       return -ENOMEM;
> +}
> +
>  static inline int virtqueue_add_split(struct virtqueue *_vq,
>                                       struct scatterlist *sgs[],
>                                       unsigned int total_sg,
> @@ -532,9 +614,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>         struct vring_virtqueue *vq = to_vvq(_vq);
>         struct scatterlist *sg;
>         struct vring_desc *desc;
> -       unsigned int i, n, avail, descs_used, prev, err_idx;
> -       int head;
> +       unsigned int i, n, avail, descs_used, prev;
>         bool indirect;
> +       int head;
>
>         START_USE(vq);
>
> @@ -586,32 +668,30 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>                 return -ENOSPC;
>         }
>
> +       if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
> +               goto err_map;
> +
>         for (n = 0; n < out_sgs; n++) {
>                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> -                       dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
> -                       if (vring_mapping_error(vq, addr))
> -                               goto unmap_release;
> -
>                         prev = i;
>                         /* Note that we trust indirect descriptor
>                          * table since it use stream DMA mapping.
>                          */
> -                       i = virtqueue_add_desc_split(_vq, desc, i, addr, sg->length,
> +                       i = virtqueue_add_desc_split(_vq, desc, i,
> +                                                    vring_sg_address(sg),
> +                                                    sg->length,
>                                                      VRING_DESC_F_NEXT,
>                                                      indirect);
>                 }
>         }
>         for (; n < (out_sgs + in_sgs); n++) {
>                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> -                       dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
> -                       if (vring_mapping_error(vq, addr))
> -                               goto unmap_release;
> -
>                         prev = i;
>                         /* Note that we trust indirect descriptor
>                          * table since it use stream DMA mapping.
>                          */
> -                       i = virtqueue_add_desc_split(_vq, desc, i, addr,
> +                       i = virtqueue_add_desc_split(_vq, desc, i,
> +                                                    vring_sg_address(sg),
>                                                      sg->length,
>                                                      VRING_DESC_F_NEXT |
>                                                      VRING_DESC_F_WRITE,
> @@ -679,23 +759,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>         return 0;
>
>  unmap_release:
> -       err_idx = i;
> -
> -       if (indirect)
> -               i = 0;
> -       else
> -               i = head;
> -
> -       for (n = 0; n < total_sg; n++) {
> -               if (i == err_idx)
> -                       break;
> -               if (indirect) {
> -                       vring_unmap_one_split_indirect(vq, &desc[i]);
> -                       i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> -               } else
> -                       i = vring_unmap_one_split(vq, i);
> -       }
> +       virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
>
> +err_map:
>         if (indirect)
>                 kfree(desc);
>
> --
> 2.32.0.3.g01195cf9f
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 04/11] virtio_ring: split: support premapped
  2023-03-27  4:05 ` [PATCH vhost v6 04/11] virtio_ring: split: support premapped Xuan Zhuo
@ 2023-03-28  6:27   ` Jason Wang
  0 siblings, 0 replies; 57+ messages in thread
From: Jason Wang @ 2023-03-28  6:27 UTC (permalink / raw)
  To: Xuan Zhuo; +Cc: Michael S. Tsirkin, virtualization

On Mon, Mar 27, 2023 at 12:05 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> virtio core only supports virtual addresses, dma is completed in virtio
> core.
>
> In some scenarios (such as the AF_XDP), the memory is allocated
> and DMA mapping is completed in advance, so it is necessary for us to
> support passing the DMA address to virtio core.
>
> Drives can use sg->dma_address to pass the mapped dma address to virtio
> core. If one sg->dma_address is used then all sgs must use
> sg->dma_address, otherwise all must be null when passing it to the APIs
> of virtio.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks

> ---
>  drivers/virtio/virtio_ring.c | 31 +++++++++++++++++++++++--------
>  1 file changed, 23 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 3ada30b475d2..9d6acd59e3e0 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -67,9 +67,13 @@
>  #define LAST_ADD_TIME_INVALID(vq)
>  #endif
>
> +#define VRING_STATE_F_MAP_INTERNAL BIT(0)
> +
>  struct vring_desc_state_split {
>         void *data;                     /* Data for callback. */
>         struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
> +       u32 flags;                      /* State flags. */
> +       u32 padding;
>  };
>
>  struct vring_desc_state_packed {
> @@ -448,7 +452,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
>  }
>
>  static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> -                                         unsigned int i)
> +                                         unsigned int i, bool dma_map_internal)
>  {
>         struct vring_desc_extra *extra = vq->split.desc_extra;
>         u16 flags;
> @@ -465,6 +469,9 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
>                                  (flags & VRING_DESC_F_WRITE) ?
>                                  DMA_FROM_DEVICE : DMA_TO_DEVICE);
>         } else {
> +               if (!dma_map_internal)
> +                       goto out;
> +
>                 dma_unmap_page(vring_dma_dev(vq),
>                                extra[i].addr,
>                                extra[i].len,
> @@ -615,7 +622,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>         struct scatterlist *sg;
>         struct vring_desc *desc;
>         unsigned int i, n, avail, descs_used, prev;
> -       bool indirect;
> +       bool indirect, dma_map_internal;
>         int head;
>
>         START_USE(vq);
> @@ -668,7 +675,8 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>                 return -ENOSPC;
>         }
>
> -       if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
> +       dma_map_internal = !sgs[0]->dma_address;
> +       if (dma_map_internal && virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
>                 goto err_map;
>
>         for (n = 0; n < out_sgs; n++) {
> @@ -735,6 +743,8 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>         else
>                 vq->split.desc_state[head].indir_desc = ctx;
>
> +       vq->split.desc_state[head].flags = dma_map_internal ? VRING_STATE_F_MAP_INTERNAL : 0;
> +
>         /* Put entry in available array (but don't update avail->idx until they
>          * do sync). */
>         avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
> @@ -759,7 +769,8 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>         return 0;
>
>  unmap_release:
> -       virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
> +       if (dma_map_internal)
> +               virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
>
>  err_map:
>         if (indirect)
> @@ -805,20 +816,22 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>  {
>         unsigned int i, j;
>         __virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> +       bool dma_map_internal;
>
>         /* Clear data ptr. */
>         vq->split.desc_state[head].data = NULL;
> +       dma_map_internal = !!(vq->split.desc_state[head].flags & VRING_STATE_F_MAP_INTERNAL);
>
>         /* Put back on free list: unmap first-level descriptors and find end */
>         i = head;
>
>         while (vq->split.vring.desc[i].flags & nextflag) {
> -               vring_unmap_one_split(vq, i);
> +               vring_unmap_one_split(vq, i, dma_map_internal);
>                 i = vq->split.desc_extra[i].next;
>                 vq->vq.num_free++;
>         }
>
> -       vring_unmap_one_split(vq, i);
> +       vring_unmap_one_split(vq, i, dma_map_internal);
>         vq->split.desc_extra[i].next = vq->free_head;
>         vq->free_head = head;
>
> @@ -840,8 +853,10 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>                                 VRING_DESC_F_INDIRECT));
>                 BUG_ON(len == 0 || len % sizeof(struct vring_desc));
>
> -               for (j = 0; j < len / sizeof(struct vring_desc); j++)
> -                       vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> +               if (dma_map_internal) {
> +                       for (j = 0; j < len / sizeof(struct vring_desc); j++)
> +                               vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> +               }
>
>                 kfree(indir_desc);
>                 vq->split.desc_state[head].indir_desc = NULL;
> --
> 2.32.0.3.g01195cf9f
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 05/11] virtio_ring: packed: support premapped
  2023-03-27  4:05 ` [PATCH vhost v6 05/11] virtio_ring: packed: " Xuan Zhuo
@ 2023-03-28  6:27   ` Jason Wang
  0 siblings, 0 replies; 57+ messages in thread
From: Jason Wang @ 2023-03-28  6:27 UTC (permalink / raw)
  To: Xuan Zhuo; +Cc: Michael S. Tsirkin, virtualization

On Mon, Mar 27, 2023 at 12:05 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> virtio core only supports virtual addresses, dma is completed in virtio
> core.
>
> In some scenarios (such as the AF_XDP), the memory is allocated
> and DMA mapping is completed in advance, so it is necessary for us to
> support passing the DMA address to virtio core.
>
> Drives can use sg->dma_address to pass the mapped dma address to virtio
> core. If one sg->dma_address is used then all sgs must use
> sg->dma_address, otherwise all must be null when passing it to the APIs
> of virtio.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks

> ---
>  drivers/virtio/virtio_ring.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 9d6acd59e3e0..2afff1dc6c74 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -81,6 +81,7 @@ struct vring_desc_state_packed {
>         struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */
>         u16 num;                        /* Descriptor list length. */
>         u16 last;                       /* The last desc state in a list. */
> +       u32 flags;                      /* State flags. */
>  };
>
>  struct vring_desc_extra {
> @@ -1264,7 +1265,8 @@ static inline u16 packed_last_used(u16 last_used_idx)
>  }
>
>  static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
> -                                    struct vring_desc_extra *extra)
> +                                    struct vring_desc_extra *extra,
> +                                    bool dma_map_internal)
>  {
>         u16 flags;
>
> @@ -1279,6 +1281,9 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
>                                  (flags & VRING_DESC_F_WRITE) ?
>                                  DMA_FROM_DEVICE : DMA_TO_DEVICE);
>         } else {
> +               if (!dma_map_internal)
> +                       return;
> +
>                 dma_unmap_page(vring_dma_dev(vq),
>                                extra->addr, extra->len,
>                                (flags & VRING_DESC_F_WRITE) ?
> @@ -1445,6 +1450,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
>         unsigned int i, n, c, descs_used;
>         __le16 head_flags, flags;
>         u16 head, id, prev, curr;
> +       bool dma_map_internal;
>         int err;
>
>         START_USE(vq);
> @@ -1490,7 +1496,8 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
>         id = vq->free_head;
>         BUG_ON(id == vq->packed.vring.num);
>
> -       if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs)) {
> +       dma_map_internal = !sgs[0]->dma_address;
> +       if (dma_map_internal && virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs)) {
>                 END_USE(vq);
>                 return -EIO;
>         }
> @@ -1544,6 +1551,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
>         vq->packed.desc_state[id].data = data;
>         vq->packed.desc_state[id].indir_desc = ctx;
>         vq->packed.desc_state[id].last = prev;
> +       vq->packed.desc_state[id].flags = dma_map_internal ? VRING_STATE_F_MAP_INTERNAL : 0;
>
>         /*
>          * A driver MUST NOT make the first descriptor in the list
> @@ -1615,8 +1623,10 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
>         struct vring_desc_state_packed *state = NULL;
>         struct vring_packed_desc *desc;
>         unsigned int i, curr;
> +       bool dma_map_internal;
>
>         state = &vq->packed.desc_state[id];
> +       dma_map_internal = !!(state->flags & VRING_STATE_F_MAP_INTERNAL);
>
>         /* Clear data ptr. */
>         state->data = NULL;
> @@ -1629,7 +1639,8 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
>                 curr = id;
>                 for (i = 0; i < state->num; i++) {
>                         vring_unmap_extra_packed(vq,
> -                                                &vq->packed.desc_extra[curr]);
> +                                                &vq->packed.desc_extra[curr],
> +                                                dma_map_internal);
>                         curr = vq->packed.desc_extra[curr].next;
>                 }
>         }
> --
> 2.32.0.3.g01195cf9f
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 06/11] virtio_ring: packed-indirect: support premapped
  2023-03-27  4:05 ` [PATCH vhost v6 06/11] virtio_ring: packed-indirect: " Xuan Zhuo
@ 2023-03-28  6:29   ` Jason Wang
  0 siblings, 0 replies; 57+ messages in thread
From: Jason Wang @ 2023-03-28  6:29 UTC (permalink / raw)
  To: Xuan Zhuo; +Cc: Michael S. Tsirkin, virtualization

On Mon, Mar 27, 2023 at 12:05 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> virtio core only supports virtual addresses, dma is completed in virtio
> core.
>
> In some scenarios (such as the AF_XDP), the memory is allocated
> and DMA mapping is completed in advance, so it is necessary for us to
> support passing the DMA address to virtio core.
>
> Drives can use sg->dma_address to pass the mapped dma address to virtio
> core. If one sg->dma_address is used then all sgs must use sg->dma_address,
> otherwise all dma_address must be null when passing it to the APIs of
> virtio.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks

> ---
>  drivers/virtio/virtio_ring.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 2afff1dc6c74..d5dffbe50070 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1338,6 +1338,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>         unsigned int i, n;
>         u16 head, id;
>         dma_addr_t addr;
> +       bool dma_map_internal;
>
>         head = vq->packed.next_avail_idx;
>         desc = alloc_indirect_packed(total_sg, gfp);
> @@ -1355,7 +1356,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>         id = vq->free_head;
>         BUG_ON(id == vq->packed.vring.num);
>
> -       if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
> +       dma_map_internal = !sgs[0]->dma_address;
> +       if (dma_map_internal && virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
>                 goto err_map;
>
>         for (n = 0; n < out_sgs + in_sgs; n++) {
> @@ -1417,6 +1419,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>         vq->packed.desc_state[id].data = data;
>         vq->packed.desc_state[id].indir_desc = desc;
>         vq->packed.desc_state[id].last = id;
> +       vq->packed.desc_state[id].flags = dma_map_internal ? VRING_STATE_F_MAP_INTERNAL : 0;
> +
>
>         vq->num_added += 1;
>
> @@ -1426,7 +1430,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>         return 0;
>
>  unmap_release:
> -       virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
> +       if (dma_map_internal)
> +               virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
>
>  err_map:
>         kfree(desc);
> @@ -1653,7 +1658,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
>                 if (!desc)
>                         return;
>
> -               if (vq->use_dma_api) {
> +               if (vq->use_dma_api && dma_map_internal) {
>                         len = vq->packed.desc_extra[id].len;
>                         for (i = 0; i < len / sizeof(struct vring_packed_desc);
>                                         i++)
> --
> 2.32.0.3.g01195cf9f
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 07/11] virtio_ring: update document for virtqueue_add_*
  2023-03-27  4:05 ` [PATCH vhost v6 07/11] virtio_ring: update document for virtqueue_add_* Xuan Zhuo
@ 2023-03-28  6:30   ` Jason Wang
  0 siblings, 0 replies; 57+ messages in thread
From: Jason Wang @ 2023-03-28  6:30 UTC (permalink / raw)
  To: Xuan Zhuo; +Cc: Michael S. Tsirkin, virtualization

On Mon, Mar 27, 2023 at 12:05 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Update the document of virtqueue_add_* series API, allowing the callers to
> use sg->dma_address to pass the dma address to Virtio Core.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks

> ---
>  drivers/virtio/virtio_ring.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index d5dffbe50070..0b198ec3ff92 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -2202,6 +2202,10 @@ static inline int virtqueue_add(struct virtqueue *_vq,
>   * Caller must ensure we don't call this with other virtqueue operations
>   * at the same time (except where noted).
>   *
> + * If the caller has done dma map then use sg->dma_address to pass dma address.
> + * If one sg->dma_address is used, then all sgs must use sg->dma_address;
> + * otherwise all sg->dma_address must be NULL.
> + *
>   * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
>   */
>  int virtqueue_add_sgs(struct virtqueue *_vq,
> @@ -2236,6 +2240,10 @@ EXPORT_SYMBOL_GPL(virtqueue_add_sgs);
>   * Caller must ensure we don't call this with other virtqueue operations
>   * at the same time (except where noted).
>   *
> + * If the caller has done dma map then use sg->dma_address to pass dma address.
> + * If one sg->dma_address is used, then all sgs must use sg->dma_address;
> + * otherwise all sg->dma_address must be NULL.
> + *
>   * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
>   */
>  int virtqueue_add_outbuf(struct virtqueue *vq,
> @@ -2258,6 +2266,10 @@ EXPORT_SYMBOL_GPL(virtqueue_add_outbuf);
>   * Caller must ensure we don't call this with other virtqueue operations
>   * at the same time (except where noted).
>   *
> + * If the caller has done dma map then use sg->dma_address to pass dma address.
> + * If one sg->dma_address is used, then all sgs must use sg->dma_address;
> + * otherwise all sg->dma_address must be NULL.
> + *
>   * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
>   */
>  int virtqueue_add_inbuf(struct virtqueue *vq,
> @@ -2281,6 +2293,10 @@ EXPORT_SYMBOL_GPL(virtqueue_add_inbuf);
>   * Caller must ensure we don't call this with other virtqueue operations
>   * at the same time (except where noted).
>   *
> + * If the caller has done dma map then use sg->dma_address to pass dma address.
> + * If one sg->dma_address is used, then all sgs must use sg->dma_address;
> + * otherwise all sg->dma_address must be NULL.
> + *
>   * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
>   */
>  int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
> --
> 2.32.0.3.g01195cf9f
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-03-27  4:05 ` [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev() Xuan Zhuo
@ 2023-04-06 17:20   ` Guenter Roeck
  2023-04-07  3:17     ` Xuan Zhuo
  2023-04-07 11:02     ` Michael S. Tsirkin
  0 siblings, 2 replies; 57+ messages in thread
From: Guenter Roeck @ 2023-04-06 17:20 UTC (permalink / raw)
  To: Xuan Zhuo; +Cc: Michael S. Tsirkin, virtualization

Hi,

On Mon, Mar 27, 2023 at 12:05:33PM +0800, Xuan Zhuo wrote:
> Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> caller can do dma operation in advance. The purpose is to keep memory
> mapped across multiple add/get buf operations.
> 
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Acked-by: Jason Wang <jasowang@redhat.com>

On sparc64, this patch results in

[    4.364643] Unable to handle kernel NULL pointer dereference
[    4.365157] tsk->{mm,active_mm}->context = 0000000000000000
[    4.365394] tsk->{mm,active_mm}->pgd = fffff80000402000
[    4.365611]               \|/ ____ \|/
[    4.365611]               "@'/ .. \`@"
[    4.365611]               /_| \__/ |_\
[    4.365611]                  \__U_/
[    4.366165] swapper/0(1): Oops [#1]
[    4.366630] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G                 N 6.3.0-rc5-next-20230405 #1
[    4.367121] TSTATE: 0000004411001606 TPC: 00000000004375c0 TNPC: 00000000004375c4 Y: 00000002    Tainted: G                 N
[    4.367548] TPC: <dma_4u_supported+0x20/0x40>
[    4.368111] g0: 0000000000000000 g1: 0000000001a93a50 g2: 0000000000000000 g3: 0000000001a96170
[    4.368439] g4: fffff8000416e9a0 g5: fffff8001dc98000 g6: fffff80004170000 g7: 0000000000000005
[    4.368769] o0: 0000000000000000 o1: ffffffffffffffff o2: 0000000000000001 o3: fffff800040b78d8
[    4.369095] o4: 0000000000000000 o5: 0000000000000000 sp: fffff80004172d51 ret_pc: 00000000004375ac
[    4.369934] RPC: <dma_4u_supported+0xc/0x40>
[    4.370160] l0: 0000000000000002 l1: 0000000000000002 l2: 0000000000000000 l3: 00000000ffffffff
[    4.370493] l4: 0000000000000001 l5: 000000000119d2b0 l6: fffff8000416e9a0 l7: 00000000018df000
[    4.370820] i0: 0000000000000001 i1: ffffffffffffffff i2: fffff80004657758 i3: 000000000125ac00
[    4.371145] i4: 0000000002362400 i5: 0000000000000000 i6: fffff80004172e01 i7: 000000000050e288
[    4.371473] I7: <dma_set_mask+0x28/0x80>
[    4.371683] Call Trace:
[    4.371864] [<000000000050e288>] dma_set_mask+0x28/0x80
[    4.372123] [<0000000000a83234>] virtio_dev_probe+0x14/0x400
[    4.372348] [<0000000000ac7a18>] really_probe+0xb8/0x340
[    4.372555] [<0000000000ac7d64>] driver_probe_device+0x24/0x120
[    4.372794] [<0000000000ac8010>] __driver_attach+0x90/0x1a0
[    4.373012] [<0000000000ac5b4c>] bus_for_each_dev+0x4c/0xa0
[    4.373226] [<0000000000ac6d80>] bus_add_driver+0x140/0x1e0
[    4.373440] [<0000000000ac8d94>] driver_register+0x74/0x120
[    4.373653] [<0000000001b2a690>] virtio_net_driver_init+0x74/0xa8
[    4.373886] [<0000000000427ee0>] do_one_initcall+0x60/0x340
[    4.374099] [<0000000001b02f50>] kernel_init_freeable+0x1bc/0x228
[    4.374330] [<0000000000f37264>] kernel_init+0x18/0x134
[    4.374534] [<00000000004060e8>] ret_from_fork+0x1c/0x2c
[    4.374738] [<0000000000000000>] 0x0
[    4.374981] Disabling lock debugging due to kernel taint
[    4.375237] Caller[000000000050e288]: dma_set_mask+0x28/0x80
[    4.375477] Caller[0000000000a83234]: virtio_dev_probe+0x14/0x400
[    4.375704] Caller[0000000000ac7a18]: really_probe+0xb8/0x340
[    4.375916] Caller[0000000000ac7d64]: driver_probe_device+0x24/0x120
[    4.376146] Caller[0000000000ac8010]: __driver_attach+0x90/0x1a0
[    4.376365] Caller[0000000000ac5b4c]: bus_for_each_dev+0x4c/0xa0
[    4.376583] Caller[0000000000ac6d80]: bus_add_driver+0x140/0x1e0
[    4.376805] Caller[0000000000ac8d94]: driver_register+0x74/0x120
[    4.377025] Caller[0000000001b2a690]: virtio_net_driver_init+0x74/0xa8
[    4.377262] Caller[0000000000427ee0]: do_one_initcall+0x60/0x340
[    4.377480] Caller[0000000001b02f50]: kernel_init_freeable+0x1bc/0x228
[    4.377721] Caller[0000000000f37264]: kernel_init+0x18/0x134
[    4.377930] Caller[00000000004060e8]: ret_from_fork+0x1c/0x2c
[    4.378140] Caller[0000000000000000]: 0x0
[    4.378309] Instruction DUMP:
[    4.378339]  80a22000
[    4.378556]  12400006
[    4.378654]  b0102001
[    4.378746] <c2076658>
[    4.378838]  b0102000
[    4.378930]  80a04019
[    4.379022]  b1653001
[    4.379115]  81cfe008
[    4.379507]  913a2000
[    4.379617]
[    4.380504] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009

Reverting it fixes the problem. Bisect log attached.

The reason is that dma_supported() calls dma_4u_supported(), which
expects that dev->archdata.iommu has been initialized. However,
this is not the case for the virtio device. Result is a null pointer
dereference in dma_4u_supported().

static int dma_4u_supported(struct device *dev, u64 device_mask)
{
        struct iommu *iommu = dev->archdata.iommu;

        if (ali_sound_dma_hack(dev, device_mask))
                return 1;

        if (device_mask < iommu->dma_addr_mask)
	                  ^^^^^^^^^^^^^^^^^^^^ Crash location
                return 0;
        return 1;
}

Guenter

---
# bad: [8417c8f5007bf4567ccffda850a3157c7d905f67] Add linux-next specific files for 20230405
# good: [7e364e56293bb98cae1b55fd835f5991c4e96e7d] Linux 6.3-rc5
git bisect start 'HEAD' 'v6.3-rc5'
# good: [699c146eb5a03458f29e94cfde4d7dd7a36deeb4] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
git bisect good 699c146eb5a03458f29e94cfde4d7dd7a36deeb4
# good: [efe74e6476a9f04263288009910f07a26597386f] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-dt.git
git bisect good efe74e6476a9f04263288009910f07a26597386f
# good: [24be607eedb226c1627973190a0b65cab39440b9] Merge branch 'char-misc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
git bisect good 24be607eedb226c1627973190a0b65cab39440b9
# good: [5213285e3f212cf266c085c1c11041adda2bc63f] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git
git bisect good 5213285e3f212cf266c085c1c11041adda2bc63f
# bad: [e8f5d7e1787104da5977773ba5b3e1d502fdb9e3] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm.git
git bisect bad e8f5d7e1787104da5977773ba5b3e1d502fdb9e3
# bad: [c08b5ad3300122790cef1bf7c1f51c554c778e4d] Merge branch 'gpio/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux.git
git bisect bad c08b5ad3300122790cef1bf7c1f51c554c778e4d
# good: [4ea0c97776bf8c63805eb0f8182d9c20072219d9] gpiolib: Check array_info for NULL only once in gpiod_get_array()
git bisect good 4ea0c97776bf8c63805eb0f8182d9c20072219d9
# good: [b8e4bb0f05bef8334e689618c75cfed122f3a292] vdpa_sim: use kthread worker
git bisect good b8e4bb0f05bef8334e689618c75cfed122f3a292
# bad: [ac1fa106f52a3ea2d2718a0de7d532d3d6c03e4e] Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
git bisect bad ac1fa106f52a3ea2d2718a0de7d532d3d6c03e4e
# good: [56c8cf37fb0800ad66c926eb3d044e50691afa6e] virtio_ring: packed: separate dma codes
git bisect good 56c8cf37fb0800ad66c926eb3d044e50691afa6e
# good: [f19986b93ab7aa23f76dedf9fd8657f865324f78] virtio_ring: update document for virtqueue_add_*
git bisect good f19986b93ab7aa23f76dedf9fd8657f865324f78
# bad: [60602b367dc928e2b3d6c1f21df5d05f90e37fa3] virtio_ring: correct the expression of the description of virtqueue_resize()
git bisect bad 60602b367dc928e2b3d6c1f21df5d05f90e37fa3
# bad: [3a06353479111e1c9e072825bb0d0730e3a0f4e7] virtio_ring: introduce virtqueue_dma_dev()
git bisect bad 3a06353479111e1c9e072825bb0d0730e3a0f4e7
# first bad commit: [3a06353479111e1c9e072825bb0d0730e3a0f4e7] virtio_ring: introduce virtqueue_dma_dev()
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-06 17:20   ` Guenter Roeck
@ 2023-04-07  3:17     ` Xuan Zhuo
  2023-04-07 12:34       ` Guenter Roeck
  2023-04-07 11:02     ` Michael S. Tsirkin
  1 sibling, 1 reply; 57+ messages in thread
From: Xuan Zhuo @ 2023-04-07  3:17 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Michael S. Tsirkin, virtualization

On Thu, 6 Apr 2023 10:20:09 -0700, Guenter Roeck <linux@roeck-us.net> wrote:
> Hi,
>
> On Mon, Mar 27, 2023 at 12:05:33PM +0800, Xuan Zhuo wrote:
> > Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> > caller can do dma operation in advance. The purpose is to keep memory
> > mapped across multiple add/get buf operations.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > Acked-by: Jason Wang <jasowang@redhat.com>
>
> On sparc64, this patch results in
>
> [    4.364643] Unable to handle kernel NULL pointer dereference
> [    4.365157] tsk->{mm,active_mm}->context = 0000000000000000
> [    4.365394] tsk->{mm,active_mm}->pgd = fffff80000402000
> [    4.365611]               \|/ ____ \|/
> [    4.365611]               "@'/ .. \`@"
> [    4.365611]               /_| \__/ |_\
> [    4.365611]                  \__U_/
> [    4.366165] swapper/0(1): Oops [#1]
> [    4.366630] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G                 N 6.3.0-rc5-next-20230405 #1
> [    4.367121] TSTATE: 0000004411001606 TPC: 00000000004375c0 TNPC: 00000000004375c4 Y: 00000002    Tainted: G                 N
> [    4.367548] TPC: <dma_4u_supported+0x20/0x40>
> [    4.368111] g0: 0000000000000000 g1: 0000000001a93a50 g2: 0000000000000000 g3: 0000000001a96170
> [    4.368439] g4: fffff8000416e9a0 g5: fffff8001dc98000 g6: fffff80004170000 g7: 0000000000000005
> [    4.368769] o0: 0000000000000000 o1: ffffffffffffffff o2: 0000000000000001 o3: fffff800040b78d8
> [    4.369095] o4: 0000000000000000 o5: 0000000000000000 sp: fffff80004172d51 ret_pc: 00000000004375ac
> [    4.369934] RPC: <dma_4u_supported+0xc/0x40>
> [    4.370160] l0: 0000000000000002 l1: 0000000000000002 l2: 0000000000000000 l3: 00000000ffffffff
> [    4.370493] l4: 0000000000000001 l5: 000000000119d2b0 l6: fffff8000416e9a0 l7: 00000000018df000
> [    4.370820] i0: 0000000000000001 i1: ffffffffffffffff i2: fffff80004657758 i3: 000000000125ac00
> [    4.371145] i4: 0000000002362400 i5: 0000000000000000 i6: fffff80004172e01 i7: 000000000050e288
> [    4.371473] I7: <dma_set_mask+0x28/0x80>
> [    4.371683] Call Trace:
> [    4.371864] [<000000000050e288>] dma_set_mask+0x28/0x80
> [    4.372123] [<0000000000a83234>] virtio_dev_probe+0x14/0x400
> [    4.372348] [<0000000000ac7a18>] really_probe+0xb8/0x340
> [    4.372555] [<0000000000ac7d64>] driver_probe_device+0x24/0x120
> [    4.372794] [<0000000000ac8010>] __driver_attach+0x90/0x1a0
> [    4.373012] [<0000000000ac5b4c>] bus_for_each_dev+0x4c/0xa0
> [    4.373226] [<0000000000ac6d80>] bus_add_driver+0x140/0x1e0
> [    4.373440] [<0000000000ac8d94>] driver_register+0x74/0x120
> [    4.373653] [<0000000001b2a690>] virtio_net_driver_init+0x74/0xa8
> [    4.373886] [<0000000000427ee0>] do_one_initcall+0x60/0x340
> [    4.374099] [<0000000001b02f50>] kernel_init_freeable+0x1bc/0x228
> [    4.374330] [<0000000000f37264>] kernel_init+0x18/0x134
> [    4.374534] [<00000000004060e8>] ret_from_fork+0x1c/0x2c
> [    4.374738] [<0000000000000000>] 0x0
> [    4.374981] Disabling lock debugging due to kernel taint
> [    4.375237] Caller[000000000050e288]: dma_set_mask+0x28/0x80
> [    4.375477] Caller[0000000000a83234]: virtio_dev_probe+0x14/0x400
> [    4.375704] Caller[0000000000ac7a18]: really_probe+0xb8/0x340
> [    4.375916] Caller[0000000000ac7d64]: driver_probe_device+0x24/0x120
> [    4.376146] Caller[0000000000ac8010]: __driver_attach+0x90/0x1a0
> [    4.376365] Caller[0000000000ac5b4c]: bus_for_each_dev+0x4c/0xa0
> [    4.376583] Caller[0000000000ac6d80]: bus_add_driver+0x140/0x1e0
> [    4.376805] Caller[0000000000ac8d94]: driver_register+0x74/0x120
> [    4.377025] Caller[0000000001b2a690]: virtio_net_driver_init+0x74/0xa8
> [    4.377262] Caller[0000000000427ee0]: do_one_initcall+0x60/0x340
> [    4.377480] Caller[0000000001b02f50]: kernel_init_freeable+0x1bc/0x228
> [    4.377721] Caller[0000000000f37264]: kernel_init+0x18/0x134
> [    4.377930] Caller[00000000004060e8]: ret_from_fork+0x1c/0x2c
> [    4.378140] Caller[0000000000000000]: 0x0
> [    4.378309] Instruction DUMP:
> [    4.378339]  80a22000
> [    4.378556]  12400006
> [    4.378654]  b0102001
> [    4.378746] <c2076658>
> [    4.378838]  b0102000
> [    4.378930]  80a04019
> [    4.379022]  b1653001
> [    4.379115]  81cfe008
> [    4.379507]  913a2000
> [    4.379617]
> [    4.380504] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
>
> Reverting it fixes the problem. Bisect log attached.
>
> The reason is that dma_supported() calls dma_4u_supported(), which
> expects that dev->archdata.iommu has been initialized. However,
> this is not the case for the virtio device. Result is a null pointer
> dereference in dma_4u_supported().
>
> static int dma_4u_supported(struct device *dev, u64 device_mask)
> {
>         struct iommu *iommu = dev->archdata.iommu;
>
>         if (ali_sound_dma_hack(dev, device_mask))
>                 return 1;
>
>         if (device_mask < iommu->dma_addr_mask)
> 	                  ^^^^^^^^^^^^^^^^^^^^ Crash location
>                 return 0;
>         return 1;
> }

sparc64 does not support direct dma?

The virtio is a virtul device based on pci.

Do you know how we should adapt to sparc64?

Thanks.


>
> Guenter
>
> ---
> # bad: [8417c8f5007bf4567ccffda850a3157c7d905f67] Add linux-next specific files for 20230405
> # good: [7e364e56293bb98cae1b55fd835f5991c4e96e7d] Linux 6.3-rc5
> git bisect start 'HEAD' 'v6.3-rc5'
> # good: [699c146eb5a03458f29e94cfde4d7dd7a36deeb4] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
> git bisect good 699c146eb5a03458f29e94cfde4d7dd7a36deeb4
> # good: [efe74e6476a9f04263288009910f07a26597386f] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-dt.git
> git bisect good efe74e6476a9f04263288009910f07a26597386f
> # good: [24be607eedb226c1627973190a0b65cab39440b9] Merge branch 'char-misc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
> git bisect good 24be607eedb226c1627973190a0b65cab39440b9
> # good: [5213285e3f212cf266c085c1c11041adda2bc63f] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git
> git bisect good 5213285e3f212cf266c085c1c11041adda2bc63f
> # bad: [e8f5d7e1787104da5977773ba5b3e1d502fdb9e3] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm.git
> git bisect bad e8f5d7e1787104da5977773ba5b3e1d502fdb9e3
> # bad: [c08b5ad3300122790cef1bf7c1f51c554c778e4d] Merge branch 'gpio/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux.git
> git bisect bad c08b5ad3300122790cef1bf7c1f51c554c778e4d
> # good: [4ea0c97776bf8c63805eb0f8182d9c20072219d9] gpiolib: Check array_info for NULL only once in gpiod_get_array()
> git bisect good 4ea0c97776bf8c63805eb0f8182d9c20072219d9
> # good: [b8e4bb0f05bef8334e689618c75cfed122f3a292] vdpa_sim: use kthread worker
> git bisect good b8e4bb0f05bef8334e689618c75cfed122f3a292
> # bad: [ac1fa106f52a3ea2d2718a0de7d532d3d6c03e4e] Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
> git bisect bad ac1fa106f52a3ea2d2718a0de7d532d3d6c03e4e
> # good: [56c8cf37fb0800ad66c926eb3d044e50691afa6e] virtio_ring: packed: separate dma codes
> git bisect good 56c8cf37fb0800ad66c926eb3d044e50691afa6e
> # good: [f19986b93ab7aa23f76dedf9fd8657f865324f78] virtio_ring: update document for virtqueue_add_*
> git bisect good f19986b93ab7aa23f76dedf9fd8657f865324f78
> # bad: [60602b367dc928e2b3d6c1f21df5d05f90e37fa3] virtio_ring: correct the expression of the description of virtqueue_resize()
> git bisect bad 60602b367dc928e2b3d6c1f21df5d05f90e37fa3
> # bad: [3a06353479111e1c9e072825bb0d0730e3a0f4e7] virtio_ring: introduce virtqueue_dma_dev()
> git bisect bad 3a06353479111e1c9e072825bb0d0730e3a0f4e7
> # first bad commit: [3a06353479111e1c9e072825bb0d0730e3a0f4e7] virtio_ring: introduce virtqueue_dma_dev()
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-06 17:20   ` Guenter Roeck
  2023-04-07  3:17     ` Xuan Zhuo
@ 2023-04-07 11:02     ` Michael S. Tsirkin
  2023-04-10  3:29       ` Xuan Zhuo
  1 sibling, 1 reply; 57+ messages in thread
From: Michael S. Tsirkin @ 2023-04-07 11:02 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Xuan Zhuo, virtualization

On Thu, Apr 06, 2023 at 10:20:09AM -0700, Guenter Roeck wrote:
> Hi,
> 
> On Mon, Mar 27, 2023 at 12:05:33PM +0800, Xuan Zhuo wrote:
> > Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> > caller can do dma operation in advance. The purpose is to keep memory
> > mapped across multiple add/get buf operations.
> > 
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > Acked-by: Jason Wang <jasowang@redhat.com>
> 
> On sparc64, this patch results in
> 
> [    4.364643] Unable to handle kernel NULL pointer dereference
> [    4.365157] tsk->{mm,active_mm}->context = 0000000000000000
> [    4.365394] tsk->{mm,active_mm}->pgd = fffff80000402000
> [    4.365611]               \|/ ____ \|/
> [    4.365611]               "@'/ .. \`@"
> [    4.365611]               /_| \__/ |_\
> [    4.365611]                  \__U_/
> [    4.366165] swapper/0(1): Oops [#1]
> [    4.366630] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G                 N 6.3.0-rc5-next-20230405 #1
> [    4.367121] TSTATE: 0000004411001606 TPC: 00000000004375c0 TNPC: 00000000004375c4 Y: 00000002    Tainted: G                 N
> [    4.367548] TPC: <dma_4u_supported+0x20/0x40>
> [    4.368111] g0: 0000000000000000 g1: 0000000001a93a50 g2: 0000000000000000 g3: 0000000001a96170
> [    4.368439] g4: fffff8000416e9a0 g5: fffff8001dc98000 g6: fffff80004170000 g7: 0000000000000005
> [    4.368769] o0: 0000000000000000 o1: ffffffffffffffff o2: 0000000000000001 o3: fffff800040b78d8
> [    4.369095] o4: 0000000000000000 o5: 0000000000000000 sp: fffff80004172d51 ret_pc: 00000000004375ac
> [    4.369934] RPC: <dma_4u_supported+0xc/0x40>
> [    4.370160] l0: 0000000000000002 l1: 0000000000000002 l2: 0000000000000000 l3: 00000000ffffffff
> [    4.370493] l4: 0000000000000001 l5: 000000000119d2b0 l6: fffff8000416e9a0 l7: 00000000018df000
> [    4.370820] i0: 0000000000000001 i1: ffffffffffffffff i2: fffff80004657758 i3: 000000000125ac00
> [    4.371145] i4: 0000000002362400 i5: 0000000000000000 i6: fffff80004172e01 i7: 000000000050e288
> [    4.371473] I7: <dma_set_mask+0x28/0x80>
> [    4.371683] Call Trace:
> [    4.371864] [<000000000050e288>] dma_set_mask+0x28/0x80
> [    4.372123] [<0000000000a83234>] virtio_dev_probe+0x14/0x400
> [    4.372348] [<0000000000ac7a18>] really_probe+0xb8/0x340
> [    4.372555] [<0000000000ac7d64>] driver_probe_device+0x24/0x120
> [    4.372794] [<0000000000ac8010>] __driver_attach+0x90/0x1a0
> [    4.373012] [<0000000000ac5b4c>] bus_for_each_dev+0x4c/0xa0
> [    4.373226] [<0000000000ac6d80>] bus_add_driver+0x140/0x1e0
> [    4.373440] [<0000000000ac8d94>] driver_register+0x74/0x120
> [    4.373653] [<0000000001b2a690>] virtio_net_driver_init+0x74/0xa8
> [    4.373886] [<0000000000427ee0>] do_one_initcall+0x60/0x340
> [    4.374099] [<0000000001b02f50>] kernel_init_freeable+0x1bc/0x228
> [    4.374330] [<0000000000f37264>] kernel_init+0x18/0x134
> [    4.374534] [<00000000004060e8>] ret_from_fork+0x1c/0x2c
> [    4.374738] [<0000000000000000>] 0x0
> [    4.374981] Disabling lock debugging due to kernel taint
> [    4.375237] Caller[000000000050e288]: dma_set_mask+0x28/0x80
> [    4.375477] Caller[0000000000a83234]: virtio_dev_probe+0x14/0x400
> [    4.375704] Caller[0000000000ac7a18]: really_probe+0xb8/0x340
> [    4.375916] Caller[0000000000ac7d64]: driver_probe_device+0x24/0x120
> [    4.376146] Caller[0000000000ac8010]: __driver_attach+0x90/0x1a0
> [    4.376365] Caller[0000000000ac5b4c]: bus_for_each_dev+0x4c/0xa0
> [    4.376583] Caller[0000000000ac6d80]: bus_add_driver+0x140/0x1e0
> [    4.376805] Caller[0000000000ac8d94]: driver_register+0x74/0x120
> [    4.377025] Caller[0000000001b2a690]: virtio_net_driver_init+0x74/0xa8
> [    4.377262] Caller[0000000000427ee0]: do_one_initcall+0x60/0x340
> [    4.377480] Caller[0000000001b02f50]: kernel_init_freeable+0x1bc/0x228
> [    4.377721] Caller[0000000000f37264]: kernel_init+0x18/0x134
> [    4.377930] Caller[00000000004060e8]: ret_from_fork+0x1c/0x2c
> [    4.378140] Caller[0000000000000000]: 0x0
> [    4.378309] Instruction DUMP:
> [    4.378339]  80a22000
> [    4.378556]  12400006
> [    4.378654]  b0102001
> [    4.378746] <c2076658>
> [    4.378838]  b0102000
> [    4.378930]  80a04019
> [    4.379022]  b1653001
> [    4.379115]  81cfe008
> [    4.379507]  913a2000
> [    4.379617]
> [    4.380504] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
> 
> Reverting it fixes the problem. Bisect log attached.
> 
> The reason is that dma_supported() calls dma_4u_supported(), which
> expects that dev->archdata.iommu has been initialized. However,
> this is not the case for the virtio device. Result is a null pointer
> dereference in dma_4u_supported().
> 
> static int dma_4u_supported(struct device *dev, u64 device_mask)
> {
>         struct iommu *iommu = dev->archdata.iommu;
> 
>         if (ali_sound_dma_hack(dev, device_mask))
>                 return 1;
> 
>         if (device_mask < iommu->dma_addr_mask)
> 	                  ^^^^^^^^^^^^^^^^^^^^ Crash location
>                 return 0;
>         return 1;
> }
> 
> Guenter

This is from the unconditional call to dma_set_mask_and_coherent, right?

Xuan Zhuo, I feel there should an explanation in the commit log
about making this unconditional call. Why are you making it
in probe?

I note that different virtio transports handle this differently.
And looking at this more I feel this should maybe be reverted for now:

Your patch does:

@@ -243,6 +244,11 @@ static int virtio_dev_probe(struct device *_d)
        u64 driver_features;
        u64 driver_features_legacy;

+       _d->dma_mask = &_d->coherent_dma_mask;
+       err = dma_set_mask_and_coherent(_d, DMA_BIT_MASK(64));
+       if (err)
+               return err;
+
        /* We have a driver! */


but for example virtio PCI has a 32 bit fallback.

For example Virtio legacy also can't access 64 bit addresses at all,
so it does:

        rc = dma_set_mask(&pci_dev->dev, DMA_BIT_MASK(64));
        if (rc) {
                rc = dma_set_mask_and_coherent(&pci_dev->dev, DMA_BIT_MASK(32));
        } else {
                /*
                 * The virtio ring base address is expressed as a 32-bit PFN,
                 * with a page size of 1 << VIRTIO_PCI_QUEUE_ADDR_SHIFT.
                 */
                dma_set_coherent_mask(&pci_dev->dev,
                                DMA_BIT_MASK(32 + VIRTIO_PCI_QUEUE_ADDR_SHIFT));
        }


mmio tries a fallback to 32 bit if 64 bit fails which does not make
sense to me, but maybe I am wrong.






> ---
> # bad: [8417c8f5007bf4567ccffda850a3157c7d905f67] Add linux-next specific files for 20230405
> # good: [7e364e56293bb98cae1b55fd835f5991c4e96e7d] Linux 6.3-rc5
> git bisect start 'HEAD' 'v6.3-rc5'
> # good: [699c146eb5a03458f29e94cfde4d7dd7a36deeb4] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
> git bisect good 699c146eb5a03458f29e94cfde4d7dd7a36deeb4
> # good: [efe74e6476a9f04263288009910f07a26597386f] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-dt.git
> git bisect good efe74e6476a9f04263288009910f07a26597386f
> # good: [24be607eedb226c1627973190a0b65cab39440b9] Merge branch 'char-misc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
> git bisect good 24be607eedb226c1627973190a0b65cab39440b9
> # good: [5213285e3f212cf266c085c1c11041adda2bc63f] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git
> git bisect good 5213285e3f212cf266c085c1c11041adda2bc63f
> # bad: [e8f5d7e1787104da5977773ba5b3e1d502fdb9e3] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm.git
> git bisect bad e8f5d7e1787104da5977773ba5b3e1d502fdb9e3
> # bad: [c08b5ad3300122790cef1bf7c1f51c554c778e4d] Merge branch 'gpio/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux.git
> git bisect bad c08b5ad3300122790cef1bf7c1f51c554c778e4d
> # good: [4ea0c97776bf8c63805eb0f8182d9c20072219d9] gpiolib: Check array_info for NULL only once in gpiod_get_array()
> git bisect good 4ea0c97776bf8c63805eb0f8182d9c20072219d9
> # good: [b8e4bb0f05bef8334e689618c75cfed122f3a292] vdpa_sim: use kthread worker
> git bisect good b8e4bb0f05bef8334e689618c75cfed122f3a292
> # bad: [ac1fa106f52a3ea2d2718a0de7d532d3d6c03e4e] Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
> git bisect bad ac1fa106f52a3ea2d2718a0de7d532d3d6c03e4e
> # good: [56c8cf37fb0800ad66c926eb3d044e50691afa6e] virtio_ring: packed: separate dma codes
> git bisect good 56c8cf37fb0800ad66c926eb3d044e50691afa6e
> # good: [f19986b93ab7aa23f76dedf9fd8657f865324f78] virtio_ring: update document for virtqueue_add_*
> git bisect good f19986b93ab7aa23f76dedf9fd8657f865324f78
> # bad: [60602b367dc928e2b3d6c1f21df5d05f90e37fa3] virtio_ring: correct the expression of the description of virtqueue_resize()
> git bisect bad 60602b367dc928e2b3d6c1f21df5d05f90e37fa3
> # bad: [3a06353479111e1c9e072825bb0d0730e3a0f4e7] virtio_ring: introduce virtqueue_dma_dev()
> git bisect bad 3a06353479111e1c9e072825bb0d0730e3a0f4e7
> # first bad commit: [3a06353479111e1c9e072825bb0d0730e3a0f4e7] virtio_ring: introduce virtqueue_dma_dev()

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-07  3:17     ` Xuan Zhuo
@ 2023-04-07 12:34       ` Guenter Roeck
  0 siblings, 0 replies; 57+ messages in thread
From: Guenter Roeck @ 2023-04-07 12:34 UTC (permalink / raw)
  To: Xuan Zhuo; +Cc: Michael S. Tsirkin, virtualization

On 4/6/23 20:17, Xuan Zhuo wrote:
> On Thu, 6 Apr 2023 10:20:09 -0700, Guenter Roeck <linux@roeck-us.net> wrote:
>> Hi,
>>
>> On Mon, Mar 27, 2023 at 12:05:33PM +0800, Xuan Zhuo wrote:
>>> Added virtqueue_dma_dev() to get DMA device for virtio. Then the
>>> caller can do dma operation in advance. The purpose is to keep memory
>>> mapped across multiple add/get buf operations.
>>>
>>> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>
>> On sparc64, this patch results in
>>
>> [    4.364643] Unable to handle kernel NULL pointer dereference
>> [    4.365157] tsk->{mm,active_mm}->context = 0000000000000000
>> [    4.365394] tsk->{mm,active_mm}->pgd = fffff80000402000
>> [    4.365611]               \|/ ____ \|/
>> [    4.365611]               "@'/ .. \`@"
>> [    4.365611]               /_| \__/ |_\
>> [    4.365611]                  \__U_/
>> [    4.366165] swapper/0(1): Oops [#1]
>> [    4.366630] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G                 N 6.3.0-rc5-next-20230405 #1
>> [    4.367121] TSTATE: 0000004411001606 TPC: 00000000004375c0 TNPC: 00000000004375c4 Y: 00000002    Tainted: G                 N
>> [    4.367548] TPC: <dma_4u_supported+0x20/0x40>
>> [    4.368111] g0: 0000000000000000 g1: 0000000001a93a50 g2: 0000000000000000 g3: 0000000001a96170
>> [    4.368439] g4: fffff8000416e9a0 g5: fffff8001dc98000 g6: fffff80004170000 g7: 0000000000000005
>> [    4.368769] o0: 0000000000000000 o1: ffffffffffffffff o2: 0000000000000001 o3: fffff800040b78d8
>> [    4.369095] o4: 0000000000000000 o5: 0000000000000000 sp: fffff80004172d51 ret_pc: 00000000004375ac
>> [    4.369934] RPC: <dma_4u_supported+0xc/0x40>
>> [    4.370160] l0: 0000000000000002 l1: 0000000000000002 l2: 0000000000000000 l3: 00000000ffffffff
>> [    4.370493] l4: 0000000000000001 l5: 000000000119d2b0 l6: fffff8000416e9a0 l7: 00000000018df000
>> [    4.370820] i0: 0000000000000001 i1: ffffffffffffffff i2: fffff80004657758 i3: 000000000125ac00
>> [    4.371145] i4: 0000000002362400 i5: 0000000000000000 i6: fffff80004172e01 i7: 000000000050e288
>> [    4.371473] I7: <dma_set_mask+0x28/0x80>
>> [    4.371683] Call Trace:
>> [    4.371864] [<000000000050e288>] dma_set_mask+0x28/0x80
>> [    4.372123] [<0000000000a83234>] virtio_dev_probe+0x14/0x400
>> [    4.372348] [<0000000000ac7a18>] really_probe+0xb8/0x340
>> [    4.372555] [<0000000000ac7d64>] driver_probe_device+0x24/0x120
>> [    4.372794] [<0000000000ac8010>] __driver_attach+0x90/0x1a0
>> [    4.373012] [<0000000000ac5b4c>] bus_for_each_dev+0x4c/0xa0
>> [    4.373226] [<0000000000ac6d80>] bus_add_driver+0x140/0x1e0
>> [    4.373440] [<0000000000ac8d94>] driver_register+0x74/0x120
>> [    4.373653] [<0000000001b2a690>] virtio_net_driver_init+0x74/0xa8
>> [    4.373886] [<0000000000427ee0>] do_one_initcall+0x60/0x340
>> [    4.374099] [<0000000001b02f50>] kernel_init_freeable+0x1bc/0x228
>> [    4.374330] [<0000000000f37264>] kernel_init+0x18/0x134
>> [    4.374534] [<00000000004060e8>] ret_from_fork+0x1c/0x2c
>> [    4.374738] [<0000000000000000>] 0x0
>> [    4.374981] Disabling lock debugging due to kernel taint
>> [    4.375237] Caller[000000000050e288]: dma_set_mask+0x28/0x80
>> [    4.375477] Caller[0000000000a83234]: virtio_dev_probe+0x14/0x400
>> [    4.375704] Caller[0000000000ac7a18]: really_probe+0xb8/0x340
>> [    4.375916] Caller[0000000000ac7d64]: driver_probe_device+0x24/0x120
>> [    4.376146] Caller[0000000000ac8010]: __driver_attach+0x90/0x1a0
>> [    4.376365] Caller[0000000000ac5b4c]: bus_for_each_dev+0x4c/0xa0
>> [    4.376583] Caller[0000000000ac6d80]: bus_add_driver+0x140/0x1e0
>> [    4.376805] Caller[0000000000ac8d94]: driver_register+0x74/0x120
>> [    4.377025] Caller[0000000001b2a690]: virtio_net_driver_init+0x74/0xa8
>> [    4.377262] Caller[0000000000427ee0]: do_one_initcall+0x60/0x340
>> [    4.377480] Caller[0000000001b02f50]: kernel_init_freeable+0x1bc/0x228
>> [    4.377721] Caller[0000000000f37264]: kernel_init+0x18/0x134
>> [    4.377930] Caller[00000000004060e8]: ret_from_fork+0x1c/0x2c
>> [    4.378140] Caller[0000000000000000]: 0x0
>> [    4.378309] Instruction DUMP:
>> [    4.378339]  80a22000
>> [    4.378556]  12400006
>> [    4.378654]  b0102001
>> [    4.378746] <c2076658>
>> [    4.378838]  b0102000
>> [    4.378930]  80a04019
>> [    4.379022]  b1653001
>> [    4.379115]  81cfe008
>> [    4.379507]  913a2000
>> [    4.379617]
>> [    4.380504] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
>>
>> Reverting it fixes the problem. Bisect log attached.
>>
>> The reason is that dma_supported() calls dma_4u_supported(), which
>> expects that dev->archdata.iommu has been initialized. However,
>> this is not the case for the virtio device. Result is a null pointer
>> dereference in dma_4u_supported().
>>
>> static int dma_4u_supported(struct device *dev, u64 device_mask)
>> {
>>          struct iommu *iommu = dev->archdata.iommu;
>>
>>          if (ali_sound_dma_hack(dev, device_mask))
>>                  return 1;
>>
>>          if (device_mask < iommu->dma_addr_mask)
>> 	                  ^^^^^^^^^^^^^^^^^^^^ Crash location
>>                  return 0;
>>          return 1;
>> }
> 
> sparc64 does not support direct dma?
> 
> The virtio is a virtul device based on pci.
> 
> Do you know how we should adapt to sparc64?
> 

It isn't exactly common or typical to set dev->dma_mask in driver probe
functions, as your code does, and much less so to do it unconditionally.
I don't know why the virtio driver is different to warrant setting it
there, so I can not answer your question.

Guenter

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-07 11:02     ` Michael S. Tsirkin
@ 2023-04-10  3:29       ` Xuan Zhuo
  2023-04-10  5:14         ` Jason Wang
  0 siblings, 1 reply; 57+ messages in thread
From: Xuan Zhuo @ 2023-04-10  3:29 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Guenter Roeck, virtualization

On Fri, 7 Apr 2023 07:02:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, Apr 06, 2023 at 10:20:09AM -0700, Guenter Roeck wrote:
> > Hi,
> >
> > On Mon, Mar 27, 2023 at 12:05:33PM +0800, Xuan Zhuo wrote:
> > > Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> > > caller can do dma operation in advance. The purpose is to keep memory
> > > mapped across multiple add/get buf operations.
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > Acked-by: Jason Wang <jasowang@redhat.com>
> >
> > On sparc64, this patch results in
> >
> > [    4.364643] Unable to handle kernel NULL pointer dereference
> > [    4.365157] tsk->{mm,active_mm}->context = 0000000000000000
> > [    4.365394] tsk->{mm,active_mm}->pgd = fffff80000402000
> > [    4.365611]               \|/ ____ \|/
> > [    4.365611]               "@'/ .. \`@"
> > [    4.365611]               /_| \__/ |_\
> > [    4.365611]                  \__U_/
> > [    4.366165] swapper/0(1): Oops [#1]
> > [    4.366630] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G                 N 6.3.0-rc5-next-20230405 #1
> > [    4.367121] TSTATE: 0000004411001606 TPC: 00000000004375c0 TNPC: 00000000004375c4 Y: 00000002    Tainted: G                 N
> > [    4.367548] TPC: <dma_4u_supported+0x20/0x40>
> > [    4.368111] g0: 0000000000000000 g1: 0000000001a93a50 g2: 0000000000000000 g3: 0000000001a96170
> > [    4.368439] g4: fffff8000416e9a0 g5: fffff8001dc98000 g6: fffff80004170000 g7: 0000000000000005
> > [    4.368769] o0: 0000000000000000 o1: ffffffffffffffff o2: 0000000000000001 o3: fffff800040b78d8
> > [    4.369095] o4: 0000000000000000 o5: 0000000000000000 sp: fffff80004172d51 ret_pc: 00000000004375ac
> > [    4.369934] RPC: <dma_4u_supported+0xc/0x40>
> > [    4.370160] l0: 0000000000000002 l1: 0000000000000002 l2: 0000000000000000 l3: 00000000ffffffff
> > [    4.370493] l4: 0000000000000001 l5: 000000000119d2b0 l6: fffff8000416e9a0 l7: 00000000018df000
> > [    4.370820] i0: 0000000000000001 i1: ffffffffffffffff i2: fffff80004657758 i3: 000000000125ac00
> > [    4.371145] i4: 0000000002362400 i5: 0000000000000000 i6: fffff80004172e01 i7: 000000000050e288
> > [    4.371473] I7: <dma_set_mask+0x28/0x80>
> > [    4.371683] Call Trace:
> > [    4.371864] [<000000000050e288>] dma_set_mask+0x28/0x80
> > [    4.372123] [<0000000000a83234>] virtio_dev_probe+0x14/0x400
> > [    4.372348] [<0000000000ac7a18>] really_probe+0xb8/0x340
> > [    4.372555] [<0000000000ac7d64>] driver_probe_device+0x24/0x120
> > [    4.372794] [<0000000000ac8010>] __driver_attach+0x90/0x1a0
> > [    4.373012] [<0000000000ac5b4c>] bus_for_each_dev+0x4c/0xa0
> > [    4.373226] [<0000000000ac6d80>] bus_add_driver+0x140/0x1e0
> > [    4.373440] [<0000000000ac8d94>] driver_register+0x74/0x120
> > [    4.373653] [<0000000001b2a690>] virtio_net_driver_init+0x74/0xa8
> > [    4.373886] [<0000000000427ee0>] do_one_initcall+0x60/0x340
> > [    4.374099] [<0000000001b02f50>] kernel_init_freeable+0x1bc/0x228
> > [    4.374330] [<0000000000f37264>] kernel_init+0x18/0x134
> > [    4.374534] [<00000000004060e8>] ret_from_fork+0x1c/0x2c
> > [    4.374738] [<0000000000000000>] 0x0
> > [    4.374981] Disabling lock debugging due to kernel taint
> > [    4.375237] Caller[000000000050e288]: dma_set_mask+0x28/0x80
> > [    4.375477] Caller[0000000000a83234]: virtio_dev_probe+0x14/0x400
> > [    4.375704] Caller[0000000000ac7a18]: really_probe+0xb8/0x340
> > [    4.375916] Caller[0000000000ac7d64]: driver_probe_device+0x24/0x120
> > [    4.376146] Caller[0000000000ac8010]: __driver_attach+0x90/0x1a0
> > [    4.376365] Caller[0000000000ac5b4c]: bus_for_each_dev+0x4c/0xa0
> > [    4.376583] Caller[0000000000ac6d80]: bus_add_driver+0x140/0x1e0
> > [    4.376805] Caller[0000000000ac8d94]: driver_register+0x74/0x120
> > [    4.377025] Caller[0000000001b2a690]: virtio_net_driver_init+0x74/0xa8
> > [    4.377262] Caller[0000000000427ee0]: do_one_initcall+0x60/0x340
> > [    4.377480] Caller[0000000001b02f50]: kernel_init_freeable+0x1bc/0x228
> > [    4.377721] Caller[0000000000f37264]: kernel_init+0x18/0x134
> > [    4.377930] Caller[00000000004060e8]: ret_from_fork+0x1c/0x2c
> > [    4.378140] Caller[0000000000000000]: 0x0
> > [    4.378309] Instruction DUMP:
> > [    4.378339]  80a22000
> > [    4.378556]  12400006
> > [    4.378654]  b0102001
> > [    4.378746] <c2076658>
> > [    4.378838]  b0102000
> > [    4.378930]  80a04019
> > [    4.379022]  b1653001
> > [    4.379115]  81cfe008
> > [    4.379507]  913a2000
> > [    4.379617]
> > [    4.380504] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
> >
> > Reverting it fixes the problem. Bisect log attached.
> >
> > The reason is that dma_supported() calls dma_4u_supported(), which
> > expects that dev->archdata.iommu has been initialized. However,
> > this is not the case for the virtio device. Result is a null pointer
> > dereference in dma_4u_supported().
> >
> > static int dma_4u_supported(struct device *dev, u64 device_mask)
> > {
> >         struct iommu *iommu = dev->archdata.iommu;
> >
> >         if (ali_sound_dma_hack(dev, device_mask))
> >                 return 1;
> >
> >         if (device_mask < iommu->dma_addr_mask)
> > 	                  ^^^^^^^^^^^^^^^^^^^^ Crash location
> >                 return 0;
> >         return 1;
> > }
> >
> > Guenter
>
> This is from the unconditional call to dma_set_mask_and_coherent, right?

Maybe not.

The problem is that the dma_4u_supported() will be called in dma_supported().
The premise of this function is that IOMMU has been initialized.

We hope to turn virtio dev into a direct dma device by dev->ops is null.

The first step in DMA frame is obtaining ops. In many cases, get_arch_dma_ops()
returns null. When ops is null, dma frame will use direct dma.

static inline const struct dma_map_ops *get_dma_ops(struct device *dev)
{
	if (dev->dma_ops)
		return dev->dma_ops;
	return get_arch_dma_ops();
}

But rethink, this is unreliable. Some platforms have returned their own ops,
including X86.

Because the priority of dev->dma_ops is higher than get_arch_dma_ops(), we
should not let dev->dma_ops be null. We should define a set of dma_ops to
actively implement direct dMA.


>
> Xuan Zhuo, I feel there should an explanation in the commit log
> about making this unconditional call. Why are you making it
> in probe?
>
> I note that different virtio transports handle this differently.
> And looking at this more I feel this should maybe be reverted for now:
>
> Your patch does:
>
> @@ -243,6 +244,11 @@ static int virtio_dev_probe(struct device *_d)
>         u64 driver_features;
>         u64 driver_features_legacy;
>
> +       _d->dma_mask = &_d->coherent_dma_mask;
> +       err = dma_set_mask_and_coherent(_d, DMA_BIT_MASK(64));
> +       if (err)
> +               return err;
> +
>         /* We have a driver! */
>
>
> but for example virtio PCI has a 32 bit fallback.
>
> For example Virtio legacy also can't access 64 bit addresses at all,
> so it does:
>
>         rc = dma_set_mask(&pci_dev->dev, DMA_BIT_MASK(64));

dma_set_mask() will also call dma_supported().


>         if (rc) {
>                 rc = dma_set_mask_and_coherent(&pci_dev->dev, DMA_BIT_MASK(32));
>         } else {
>                 /*
>                  * The virtio ring base address is expressed as a 32-bit PFN,
>                  * with a page size of 1 << VIRTIO_PCI_QUEUE_ADDR_SHIFT.
>                  */
>                 dma_set_coherent_mask(&pci_dev->dev,
>                                 DMA_BIT_MASK(32 + VIRTIO_PCI_QUEUE_ADDR_SHIFT));
>         }
>
>
> mmio tries a fallback to 32 bit if 64 bit fails which does not make
> sense to me, but maybe I am wrong.
>
>

These physical devices have IOMMU, so it is normal, but Virtio Dev does not have
any physical IOMMU, so there are problems. I borrow from VDPA. So I think
there are similar problems with VDPA.


Thanks.


>
>
>
>
> > ---
> > # bad: [8417c8f5007bf4567ccffda850a3157c7d905f67] Add linux-next specific files for 20230405
> > # good: [7e364e56293bb98cae1b55fd835f5991c4e96e7d] Linux 6.3-rc5
> > git bisect start 'HEAD' 'v6.3-rc5'
> > # good: [699c146eb5a03458f29e94cfde4d7dd7a36deeb4] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
> > git bisect good 699c146eb5a03458f29e94cfde4d7dd7a36deeb4
> > # good: [efe74e6476a9f04263288009910f07a26597386f] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-dt.git
> > git bisect good efe74e6476a9f04263288009910f07a26597386f
> > # good: [24be607eedb226c1627973190a0b65cab39440b9] Merge branch 'char-misc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
> > git bisect good 24be607eedb226c1627973190a0b65cab39440b9
> > # good: [5213285e3f212cf266c085c1c11041adda2bc63f] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git
> > git bisect good 5213285e3f212cf266c085c1c11041adda2bc63f
> > # bad: [e8f5d7e1787104da5977773ba5b3e1d502fdb9e3] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm.git
> > git bisect bad e8f5d7e1787104da5977773ba5b3e1d502fdb9e3
> > # bad: [c08b5ad3300122790cef1bf7c1f51c554c778e4d] Merge branch 'gpio/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux.git
> > git bisect bad c08b5ad3300122790cef1bf7c1f51c554c778e4d
> > # good: [4ea0c97776bf8c63805eb0f8182d9c20072219d9] gpiolib: Check array_info for NULL only once in gpiod_get_array()
> > git bisect good 4ea0c97776bf8c63805eb0f8182d9c20072219d9
> > # good: [b8e4bb0f05bef8334e689618c75cfed122f3a292] vdpa_sim: use kthread worker
> > git bisect good b8e4bb0f05bef8334e689618c75cfed122f3a292
> > # bad: [ac1fa106f52a3ea2d2718a0de7d532d3d6c03e4e] Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
> > git bisect bad ac1fa106f52a3ea2d2718a0de7d532d3d6c03e4e
> > # good: [56c8cf37fb0800ad66c926eb3d044e50691afa6e] virtio_ring: packed: separate dma codes
> > git bisect good 56c8cf37fb0800ad66c926eb3d044e50691afa6e
> > # good: [f19986b93ab7aa23f76dedf9fd8657f865324f78] virtio_ring: update document for virtqueue_add_*
> > git bisect good f19986b93ab7aa23f76dedf9fd8657f865324f78
> > # bad: [60602b367dc928e2b3d6c1f21df5d05f90e37fa3] virtio_ring: correct the expression of the description of virtqueue_resize()
> > git bisect bad 60602b367dc928e2b3d6c1f21df5d05f90e37fa3
> > # bad: [3a06353479111e1c9e072825bb0d0730e3a0f4e7] virtio_ring: introduce virtqueue_dma_dev()
> > git bisect bad 3a06353479111e1c9e072825bb0d0730e3a0f4e7
> > # first bad commit: [3a06353479111e1c9e072825bb0d0730e3a0f4e7] virtio_ring: introduce virtqueue_dma_dev()
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-10  3:29       ` Xuan Zhuo
@ 2023-04-10  5:14         ` Jason Wang
  2023-04-10  6:03           ` Xuan Zhuo
                             ` (2 more replies)
  0 siblings, 3 replies; 57+ messages in thread
From: Jason Wang @ 2023-04-10  5:14 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Christoph Hellwig, virtualization, Guenter Roeck, Michael S. Tsirkin

On Mon, Apr 10, 2023 at 11:48 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Fri, 7 Apr 2023 07:02:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Thu, Apr 06, 2023 at 10:20:09AM -0700, Guenter Roeck wrote:
> > > Hi,
> > >
> > > On Mon, Mar 27, 2023 at 12:05:33PM +0800, Xuan Zhuo wrote:
> > > > Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> > > > caller can do dma operation in advance. The purpose is to keep memory
> > > > mapped across multiple add/get buf operations.
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > Acked-by: Jason Wang <jasowang@redhat.com>
> > >
> > > On sparc64, this patch results in
> > >
> > > [    4.364643] Unable to handle kernel NULL pointer dereference
> > > [    4.365157] tsk->{mm,active_mm}->context = 0000000000000000
> > > [    4.365394] tsk->{mm,active_mm}->pgd = fffff80000402000
> > > [    4.365611]               \|/ ____ \|/
> > > [    4.365611]               "@'/ .. \`@"
> > > [    4.365611]               /_| \__/ |_\
> > > [    4.365611]                  \__U_/
> > > [    4.366165] swapper/0(1): Oops [#1]
> > > [    4.366630] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G                 N 6.3.0-rc5-next-20230405 #1
> > > [    4.367121] TSTATE: 0000004411001606 TPC: 00000000004375c0 TNPC: 00000000004375c4 Y: 00000002    Tainted: G                 N
> > > [    4.367548] TPC: <dma_4u_supported+0x20/0x40>
> > > [    4.368111] g0: 0000000000000000 g1: 0000000001a93a50 g2: 0000000000000000 g3: 0000000001a96170
> > > [    4.368439] g4: fffff8000416e9a0 g5: fffff8001dc98000 g6: fffff80004170000 g7: 0000000000000005
> > > [    4.368769] o0: 0000000000000000 o1: ffffffffffffffff o2: 0000000000000001 o3: fffff800040b78d8
> > > [    4.369095] o4: 0000000000000000 o5: 0000000000000000 sp: fffff80004172d51 ret_pc: 00000000004375ac
> > > [    4.369934] RPC: <dma_4u_supported+0xc/0x40>
> > > [    4.370160] l0: 0000000000000002 l1: 0000000000000002 l2: 0000000000000000 l3: 00000000ffffffff
> > > [    4.370493] l4: 0000000000000001 l5: 000000000119d2b0 l6: fffff8000416e9a0 l7: 00000000018df000
> > > [    4.370820] i0: 0000000000000001 i1: ffffffffffffffff i2: fffff80004657758 i3: 000000000125ac00
> > > [    4.371145] i4: 0000000002362400 i5: 0000000000000000 i6: fffff80004172e01 i7: 000000000050e288
> > > [    4.371473] I7: <dma_set_mask+0x28/0x80>
> > > [    4.371683] Call Trace:
> > > [    4.371864] [<000000000050e288>] dma_set_mask+0x28/0x80
> > > [    4.372123] [<0000000000a83234>] virtio_dev_probe+0x14/0x400
> > > [    4.372348] [<0000000000ac7a18>] really_probe+0xb8/0x340
> > > [    4.372555] [<0000000000ac7d64>] driver_probe_device+0x24/0x120
> > > [    4.372794] [<0000000000ac8010>] __driver_attach+0x90/0x1a0
> > > [    4.373012] [<0000000000ac5b4c>] bus_for_each_dev+0x4c/0xa0
> > > [    4.373226] [<0000000000ac6d80>] bus_add_driver+0x140/0x1e0
> > > [    4.373440] [<0000000000ac8d94>] driver_register+0x74/0x120
> > > [    4.373653] [<0000000001b2a690>] virtio_net_driver_init+0x74/0xa8
> > > [    4.373886] [<0000000000427ee0>] do_one_initcall+0x60/0x340
> > > [    4.374099] [<0000000001b02f50>] kernel_init_freeable+0x1bc/0x228
> > > [    4.374330] [<0000000000f37264>] kernel_init+0x18/0x134
> > > [    4.374534] [<00000000004060e8>] ret_from_fork+0x1c/0x2c
> > > [    4.374738] [<0000000000000000>] 0x0
> > > [    4.374981] Disabling lock debugging due to kernel taint
> > > [    4.375237] Caller[000000000050e288]: dma_set_mask+0x28/0x80
> > > [    4.375477] Caller[0000000000a83234]: virtio_dev_probe+0x14/0x400
> > > [    4.375704] Caller[0000000000ac7a18]: really_probe+0xb8/0x340
> > > [    4.375916] Caller[0000000000ac7d64]: driver_probe_device+0x24/0x120
> > > [    4.376146] Caller[0000000000ac8010]: __driver_attach+0x90/0x1a0
> > > [    4.376365] Caller[0000000000ac5b4c]: bus_for_each_dev+0x4c/0xa0
> > > [    4.376583] Caller[0000000000ac6d80]: bus_add_driver+0x140/0x1e0
> > > [    4.376805] Caller[0000000000ac8d94]: driver_register+0x74/0x120
> > > [    4.377025] Caller[0000000001b2a690]: virtio_net_driver_init+0x74/0xa8
> > > [    4.377262] Caller[0000000000427ee0]: do_one_initcall+0x60/0x340
> > > [    4.377480] Caller[0000000001b02f50]: kernel_init_freeable+0x1bc/0x228
> > > [    4.377721] Caller[0000000000f37264]: kernel_init+0x18/0x134
> > > [    4.377930] Caller[00000000004060e8]: ret_from_fork+0x1c/0x2c
> > > [    4.378140] Caller[0000000000000000]: 0x0
> > > [    4.378309] Instruction DUMP:
> > > [    4.378339]  80a22000
> > > [    4.378556]  12400006
> > > [    4.378654]  b0102001
> > > [    4.378746] <c2076658>
> > > [    4.378838]  b0102000
> > > [    4.378930]  80a04019
> > > [    4.379022]  b1653001
> > > [    4.379115]  81cfe008
> > > [    4.379507]  913a2000
> > > [    4.379617]
> > > [    4.380504] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
> > >
> > > Reverting it fixes the problem. Bisect log attached.
> > >
> > > The reason is that dma_supported() calls dma_4u_supported(), which
> > > expects that dev->archdata.iommu has been initialized. However,
> > > this is not the case for the virtio device. Result is a null pointer
> > > dereference in dma_4u_supported().
> > >
> > > static int dma_4u_supported(struct device *dev, u64 device_mask)
> > > {
> > >         struct iommu *iommu = dev->archdata.iommu;
> > >
> > >         if (ali_sound_dma_hack(dev, device_mask))
> > >                 return 1;
> > >
> > >         if (device_mask < iommu->dma_addr_mask)
> > >                       ^^^^^^^^^^^^^^^^^^^^ Crash location
> > >                 return 0;
> > >         return 1;
> > > }
> > >
> > > Guenter
> >
> > This is from the unconditional call to dma_set_mask_and_coherent, right?
>
> Maybe not.
>
> The problem is that the dma_4u_supported() will be called in dma_supported().
> The premise of this function is that IOMMU has been initialized.
>
> We hope to turn virtio dev into a direct dma device by dev->ops is null.
>
> The first step in DMA frame is obtaining ops. In many cases, get_arch_dma_ops()
> returns null. When ops is null, dma frame will use direct dma.
>
> static inline const struct dma_map_ops *get_dma_ops(struct device *dev)
> {
>         if (dev->dma_ops)
>                 return dev->dma_ops;
>         return get_arch_dma_ops();
> }
>
> But rethink, this is unreliable. Some platforms have returned their own ops,
> including X86.
>
> Because the priority of dev->dma_ops is higher than get_arch_dma_ops(), we
> should not let dev->dma_ops be null. We should define a set of dma_ops to
> actively implement direct dMA.

Then this will duplicate with existing DMA API helpers. Could we tweak
the sparc or introduce a new flag for the device structure then the
DMA API knows it needs to use direct mapping?

Adding Christoph for more comments.

>
>
> >
> > Xuan Zhuo, I feel there should an explanation in the commit log
> > about making this unconditional call. Why are you making it
> > in probe?
> >
> > I note that different virtio transports handle this differently.
> > And looking at this more I feel this should maybe be reverted for now:
> >
> > Your patch does:
> >
> > @@ -243,6 +244,11 @@ static int virtio_dev_probe(struct device *_d)
> >         u64 driver_features;
> >         u64 driver_features_legacy;
> >
> > +       _d->dma_mask = &_d->coherent_dma_mask;
> > +       err = dma_set_mask_and_coherent(_d, DMA_BIT_MASK(64));
> > +       if (err)
> > +               return err;
> > +
> >         /* We have a driver! */
> >
> >
> > but for example virtio PCI has a 32 bit fallback.
> >
> > For example Virtio legacy also can't access 64 bit addresses at all,
> > so it does:
> >
> >         rc = dma_set_mask(&pci_dev->dev, DMA_BIT_MASK(64));
>
> dma_set_mask() will also call dma_supported().
>
>
> >         if (rc) {
> >                 rc = dma_set_mask_and_coherent(&pci_dev->dev, DMA_BIT_MASK(32));
> >         } else {
> >                 /*
> >                  * The virtio ring base address is expressed as a 32-bit PFN,
> >                  * with a page size of 1 << VIRTIO_PCI_QUEUE_ADDR_SHIFT.
> >                  */
> >                 dma_set_coherent_mask(&pci_dev->dev,
> >                                 DMA_BIT_MASK(32 + VIRTIO_PCI_QUEUE_ADDR_SHIFT));
> >         }
> >
> >
> > mmio tries a fallback to 32 bit if 64 bit fails which does not make
> > sense to me, but maybe I am wrong.
> >
> >
>
> These physical devices have IOMMU, so it is normal, but Virtio Dev does not have
> any physical IOMMU, so there are problems. I borrow from VDPA. So I think
> there are similar problems with VDPA.

Yes.

Thanks

>
>
> Thanks.
>
>
> >
> >
> >
> >
> > > ---
> > > # bad: [8417c8f5007bf4567ccffda850a3157c7d905f67] Add linux-next specific files for 20230405
> > > # good: [7e364e56293bb98cae1b55fd835f5991c4e96e7d] Linux 6.3-rc5
> > > git bisect start 'HEAD' 'v6.3-rc5'
> > > # good: [699c146eb5a03458f29e94cfde4d7dd7a36deeb4] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
> > > git bisect good 699c146eb5a03458f29e94cfde4d7dd7a36deeb4
> > > # good: [efe74e6476a9f04263288009910f07a26597386f] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-dt.git
> > > git bisect good efe74e6476a9f04263288009910f07a26597386f
> > > # good: [24be607eedb226c1627973190a0b65cab39440b9] Merge branch 'char-misc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
> > > git bisect good 24be607eedb226c1627973190a0b65cab39440b9
> > > # good: [5213285e3f212cf266c085c1c11041adda2bc63f] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git
> > > git bisect good 5213285e3f212cf266c085c1c11041adda2bc63f
> > > # bad: [e8f5d7e1787104da5977773ba5b3e1d502fdb9e3] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm.git
> > > git bisect bad e8f5d7e1787104da5977773ba5b3e1d502fdb9e3
> > > # bad: [c08b5ad3300122790cef1bf7c1f51c554c778e4d] Merge branch 'gpio/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux.git
> > > git bisect bad c08b5ad3300122790cef1bf7c1f51c554c778e4d
> > > # good: [4ea0c97776bf8c63805eb0f8182d9c20072219d9] gpiolib: Check array_info for NULL only once in gpiod_get_array()
> > > git bisect good 4ea0c97776bf8c63805eb0f8182d9c20072219d9
> > > # good: [b8e4bb0f05bef8334e689618c75cfed122f3a292] vdpa_sim: use kthread worker
> > > git bisect good b8e4bb0f05bef8334e689618c75cfed122f3a292
> > > # bad: [ac1fa106f52a3ea2d2718a0de7d532d3d6c03e4e] Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
> > > git bisect bad ac1fa106f52a3ea2d2718a0de7d532d3d6c03e4e
> > > # good: [56c8cf37fb0800ad66c926eb3d044e50691afa6e] virtio_ring: packed: separate dma codes
> > > git bisect good 56c8cf37fb0800ad66c926eb3d044e50691afa6e
> > > # good: [f19986b93ab7aa23f76dedf9fd8657f865324f78] virtio_ring: update document for virtqueue_add_*
> > > git bisect good f19986b93ab7aa23f76dedf9fd8657f865324f78
> > > # bad: [60602b367dc928e2b3d6c1f21df5d05f90e37fa3] virtio_ring: correct the expression of the description of virtqueue_resize()
> > > git bisect bad 60602b367dc928e2b3d6c1f21df5d05f90e37fa3
> > > # bad: [3a06353479111e1c9e072825bb0d0730e3a0f4e7] virtio_ring: introduce virtqueue_dma_dev()
> > > git bisect bad 3a06353479111e1c9e072825bb0d0730e3a0f4e7
> > > # first bad commit: [3a06353479111e1c9e072825bb0d0730e3a0f4e7] virtio_ring: introduce virtqueue_dma_dev()
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-10  5:14         ` Jason Wang
@ 2023-04-10  6:03           ` Xuan Zhuo
  2023-04-10  6:40             ` Michael S. Tsirkin
  2023-04-10 15:27           ` Christoph Hellwig
  2023-04-11  7:12           ` Xuan Zhuo
  2 siblings, 1 reply; 57+ messages in thread
From: Xuan Zhuo @ 2023-04-10  6:03 UTC (permalink / raw)
  To: Jason Wang
  Cc: Christoph Hellwig, virtualization, Guenter Roeck, Michael S. Tsirkin

On Mon, 10 Apr 2023 13:14:13 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Apr 10, 2023 at 11:48 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Fri, 7 Apr 2023 07:02:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Thu, Apr 06, 2023 at 10:20:09AM -0700, Guenter Roeck wrote:
> > > > Hi,
> > > >
> > > > On Mon, Mar 27, 2023 at 12:05:33PM +0800, Xuan Zhuo wrote:
> > > > > Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> > > > > caller can do dma operation in advance. The purpose is to keep memory
> > > > > mapped across multiple add/get buf operations.
> > > > >
> > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > Acked-by: Jason Wang <jasowang@redhat.com>
> > > >
> > > > On sparc64, this patch results in
> > > >
> > > > [    4.364643] Unable to handle kernel NULL pointer dereference
> > > > [    4.365157] tsk->{mm,active_mm}->context = 0000000000000000
> > > > [    4.365394] tsk->{mm,active_mm}->pgd = fffff80000402000
> > > > [    4.365611]               \|/ ____ \|/
> > > > [    4.365611]               "@'/ .. \`@"
> > > > [    4.365611]               /_| \__/ |_\
> > > > [    4.365611]                  \__U_/
> > > > [    4.366165] swapper/0(1): Oops [#1]
> > > > [    4.366630] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G                 N 6.3.0-rc5-next-20230405 #1
> > > > [    4.367121] TSTATE: 0000004411001606 TPC: 00000000004375c0 TNPC: 00000000004375c4 Y: 00000002    Tainted: G                 N
> > > > [    4.367548] TPC: <dma_4u_supported+0x20/0x40>
> > > > [    4.368111] g0: 0000000000000000 g1: 0000000001a93a50 g2: 0000000000000000 g3: 0000000001a96170
> > > > [    4.368439] g4: fffff8000416e9a0 g5: fffff8001dc98000 g6: fffff80004170000 g7: 0000000000000005
> > > > [    4.368769] o0: 0000000000000000 o1: ffffffffffffffff o2: 0000000000000001 o3: fffff800040b78d8
> > > > [    4.369095] o4: 0000000000000000 o5: 0000000000000000 sp: fffff80004172d51 ret_pc: 00000000004375ac
> > > > [    4.369934] RPC: <dma_4u_supported+0xc/0x40>
> > > > [    4.370160] l0: 0000000000000002 l1: 0000000000000002 l2: 0000000000000000 l3: 00000000ffffffff
> > > > [    4.370493] l4: 0000000000000001 l5: 000000000119d2b0 l6: fffff8000416e9a0 l7: 00000000018df000
> > > > [    4.370820] i0: 0000000000000001 i1: ffffffffffffffff i2: fffff80004657758 i3: 000000000125ac00
> > > > [    4.371145] i4: 0000000002362400 i5: 0000000000000000 i6: fffff80004172e01 i7: 000000000050e288
> > > > [    4.371473] I7: <dma_set_mask+0x28/0x80>
> > > > [    4.371683] Call Trace:
> > > > [    4.371864] [<000000000050e288>] dma_set_mask+0x28/0x80
> > > > [    4.372123] [<0000000000a83234>] virtio_dev_probe+0x14/0x400
> > > > [    4.372348] [<0000000000ac7a18>] really_probe+0xb8/0x340
> > > > [    4.372555] [<0000000000ac7d64>] driver_probe_device+0x24/0x120
> > > > [    4.372794] [<0000000000ac8010>] __driver_attach+0x90/0x1a0
> > > > [    4.373012] [<0000000000ac5b4c>] bus_for_each_dev+0x4c/0xa0
> > > > [    4.373226] [<0000000000ac6d80>] bus_add_driver+0x140/0x1e0
> > > > [    4.373440] [<0000000000ac8d94>] driver_register+0x74/0x120
> > > > [    4.373653] [<0000000001b2a690>] virtio_net_driver_init+0x74/0xa8
> > > > [    4.373886] [<0000000000427ee0>] do_one_initcall+0x60/0x340
> > > > [    4.374099] [<0000000001b02f50>] kernel_init_freeable+0x1bc/0x228
> > > > [    4.374330] [<0000000000f37264>] kernel_init+0x18/0x134
> > > > [    4.374534] [<00000000004060e8>] ret_from_fork+0x1c/0x2c
> > > > [    4.374738] [<0000000000000000>] 0x0
> > > > [    4.374981] Disabling lock debugging due to kernel taint
> > > > [    4.375237] Caller[000000000050e288]: dma_set_mask+0x28/0x80
> > > > [    4.375477] Caller[0000000000a83234]: virtio_dev_probe+0x14/0x400
> > > > [    4.375704] Caller[0000000000ac7a18]: really_probe+0xb8/0x340
> > > > [    4.375916] Caller[0000000000ac7d64]: driver_probe_device+0x24/0x120
> > > > [    4.376146] Caller[0000000000ac8010]: __driver_attach+0x90/0x1a0
> > > > [    4.376365] Caller[0000000000ac5b4c]: bus_for_each_dev+0x4c/0xa0
> > > > [    4.376583] Caller[0000000000ac6d80]: bus_add_driver+0x140/0x1e0
> > > > [    4.376805] Caller[0000000000ac8d94]: driver_register+0x74/0x120
> > > > [    4.377025] Caller[0000000001b2a690]: virtio_net_driver_init+0x74/0xa8
> > > > [    4.377262] Caller[0000000000427ee0]: do_one_initcall+0x60/0x340
> > > > [    4.377480] Caller[0000000001b02f50]: kernel_init_freeable+0x1bc/0x228
> > > > [    4.377721] Caller[0000000000f37264]: kernel_init+0x18/0x134
> > > > [    4.377930] Caller[00000000004060e8]: ret_from_fork+0x1c/0x2c
> > > > [    4.378140] Caller[0000000000000000]: 0x0
> > > > [    4.378309] Instruction DUMP:
> > > > [    4.378339]  80a22000
> > > > [    4.378556]  12400006
> > > > [    4.378654]  b0102001
> > > > [    4.378746] <c2076658>
> > > > [    4.378838]  b0102000
> > > > [    4.378930]  80a04019
> > > > [    4.379022]  b1653001
> > > > [    4.379115]  81cfe008
> > > > [    4.379507]  913a2000
> > > > [    4.379617]
> > > > [    4.380504] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
> > > >
> > > > Reverting it fixes the problem. Bisect log attached.
> > > >
> > > > The reason is that dma_supported() calls dma_4u_supported(), which
> > > > expects that dev->archdata.iommu has been initialized. However,
> > > > this is not the case for the virtio device. Result is a null pointer
> > > > dereference in dma_4u_supported().
> > > >
> > > > static int dma_4u_supported(struct device *dev, u64 device_mask)
> > > > {
> > > >         struct iommu *iommu = dev->archdata.iommu;
> > > >
> > > >         if (ali_sound_dma_hack(dev, device_mask))
> > > >                 return 1;
> > > >
> > > >         if (device_mask < iommu->dma_addr_mask)
> > > >                       ^^^^^^^^^^^^^^^^^^^^ Crash location
> > > >                 return 0;
> > > >         return 1;
> > > > }
> > > >
> > > > Guenter
> > >
> > > This is from the unconditional call to dma_set_mask_and_coherent, right?
> >
> > Maybe not.
> >
> > The problem is that the dma_4u_supported() will be called in dma_supported().
> > The premise of this function is that IOMMU has been initialized.
> >
> > We hope to turn virtio dev into a direct dma device by dev->ops is null.
> >
> > The first step in DMA frame is obtaining ops. In many cases, get_arch_dma_ops()
> > returns null. When ops is null, dma frame will use direct dma.
> >
> > static inline const struct dma_map_ops *get_dma_ops(struct device *dev)
> > {
> >         if (dev->dma_ops)
> >                 return dev->dma_ops;
> >         return get_arch_dma_ops();
> > }
> >
> > But rethink, this is unreliable. Some platforms have returned their own ops,
> > including X86.
> >
> > Because the priority of dev->dma_ops is higher than get_arch_dma_ops(), we
> > should not let dev->dma_ops be null. We should define a set of dma_ops to
> > actively implement direct dMA.
>
> Then this will duplicate with existing DMA API helpers. Could we tweak
> the sparc or introduce a new flag for the device structure then the
> DMA API knows it needs to use direct mapping?

If we add a new flag will better. This is not the problem of sparc, x86 may
occurs this also.

Thanks.


>
> Adding Christoph for more comments.
>
> >
> >
> > >
> > > Xuan Zhuo, I feel there should an explanation in the commit log
> > > about making this unconditional call. Why are you making it
> > > in probe?
> > >
> > > I note that different virtio transports handle this differently.
> > > And looking at this more I feel this should maybe be reverted for now:
> > >
> > > Your patch does:
> > >
> > > @@ -243,6 +244,11 @@ static int virtio_dev_probe(struct device *_d)
> > >         u64 driver_features;
> > >         u64 driver_features_legacy;
> > >
> > > +       _d->dma_mask = &_d->coherent_dma_mask;
> > > +       err = dma_set_mask_and_coherent(_d, DMA_BIT_MASK(64));
> > > +       if (err)
> > > +               return err;
> > > +
> > >         /* We have a driver! */
> > >
> > >
> > > but for example virtio PCI has a 32 bit fallback.
> > >
> > > For example Virtio legacy also can't access 64 bit addresses at all,
> > > so it does:
> > >
> > >         rc = dma_set_mask(&pci_dev->dev, DMA_BIT_MASK(64));
> >
> > dma_set_mask() will also call dma_supported().
> >
> >
> > >         if (rc) {
> > >                 rc = dma_set_mask_and_coherent(&pci_dev->dev, DMA_BIT_MASK(32));
> > >         } else {
> > >                 /*
> > >                  * The virtio ring base address is expressed as a 32-bit PFN,
> > >                  * with a page size of 1 << VIRTIO_PCI_QUEUE_ADDR_SHIFT.
> > >                  */
> > >                 dma_set_coherent_mask(&pci_dev->dev,
> > >                                 DMA_BIT_MASK(32 + VIRTIO_PCI_QUEUE_ADDR_SHIFT));
> > >         }
> > >
> > >
> > > mmio tries a fallback to 32 bit if 64 bit fails which does not make
> > > sense to me, but maybe I am wrong.
> > >
> > >
> >
> > These physical devices have IOMMU, so it is normal, but Virtio Dev does not have
> > any physical IOMMU, so there are problems. I borrow from VDPA. So I think
> > there are similar problems with VDPA.
>
> Yes.
>
> Thanks
>
> >
> >
> > Thanks.
> >
> >
> > >
> > >
> > >
> > >
> > > > ---
> > > > # bad: [8417c8f5007bf4567ccffda850a3157c7d905f67] Add linux-next specific files for 20230405
> > > > # good: [7e364e56293bb98cae1b55fd835f5991c4e96e7d] Linux 6.3-rc5
> > > > git bisect start 'HEAD' 'v6.3-rc5'
> > > > # good: [699c146eb5a03458f29e94cfde4d7dd7a36deeb4] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
> > > > git bisect good 699c146eb5a03458f29e94cfde4d7dd7a36deeb4
> > > > # good: [efe74e6476a9f04263288009910f07a26597386f] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-dt.git
> > > > git bisect good efe74e6476a9f04263288009910f07a26597386f
> > > > # good: [24be607eedb226c1627973190a0b65cab39440b9] Merge branch 'char-misc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
> > > > git bisect good 24be607eedb226c1627973190a0b65cab39440b9
> > > > # good: [5213285e3f212cf266c085c1c11041adda2bc63f] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git
> > > > git bisect good 5213285e3f212cf266c085c1c11041adda2bc63f
> > > > # bad: [e8f5d7e1787104da5977773ba5b3e1d502fdb9e3] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm.git
> > > > git bisect bad e8f5d7e1787104da5977773ba5b3e1d502fdb9e3
> > > > # bad: [c08b5ad3300122790cef1bf7c1f51c554c778e4d] Merge branch 'gpio/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux.git
> > > > git bisect bad c08b5ad3300122790cef1bf7c1f51c554c778e4d
> > > > # good: [4ea0c97776bf8c63805eb0f8182d9c20072219d9] gpiolib: Check array_info for NULL only once in gpiod_get_array()
> > > > git bisect good 4ea0c97776bf8c63805eb0f8182d9c20072219d9
> > > > # good: [b8e4bb0f05bef8334e689618c75cfed122f3a292] vdpa_sim: use kthread worker
> > > > git bisect good b8e4bb0f05bef8334e689618c75cfed122f3a292
> > > > # bad: [ac1fa106f52a3ea2d2718a0de7d532d3d6c03e4e] Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
> > > > git bisect bad ac1fa106f52a3ea2d2718a0de7d532d3d6c03e4e
> > > > # good: [56c8cf37fb0800ad66c926eb3d044e50691afa6e] virtio_ring: packed: separate dma codes
> > > > git bisect good 56c8cf37fb0800ad66c926eb3d044e50691afa6e
> > > > # good: [f19986b93ab7aa23f76dedf9fd8657f865324f78] virtio_ring: update document for virtqueue_add_*
> > > > git bisect good f19986b93ab7aa23f76dedf9fd8657f865324f78
> > > > # bad: [60602b367dc928e2b3d6c1f21df5d05f90e37fa3] virtio_ring: correct the expression of the description of virtqueue_resize()
> > > > git bisect bad 60602b367dc928e2b3d6c1f21df5d05f90e37fa3
> > > > # bad: [3a06353479111e1c9e072825bb0d0730e3a0f4e7] virtio_ring: introduce virtqueue_dma_dev()
> > > > git bisect bad 3a06353479111e1c9e072825bb0d0730e3a0f4e7
> > > > # first bad commit: [3a06353479111e1c9e072825bb0d0730e3a0f4e7] virtio_ring: introduce virtqueue_dma_dev()
> > >
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-10  6:03           ` Xuan Zhuo
@ 2023-04-10  6:40             ` Michael S. Tsirkin
  2023-04-10  6:42               ` Xuan Zhuo
  0 siblings, 1 reply; 57+ messages in thread
From: Michael S. Tsirkin @ 2023-04-10  6:40 UTC (permalink / raw)
  To: Xuan Zhuo; +Cc: Christoph Hellwig, Guenter Roeck, virtualization

On Mon, Apr 10, 2023 at 02:03:37PM +0800, Xuan Zhuo wrote:
> On Mon, 10 Apr 2023 13:14:13 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Mon, Apr 10, 2023 at 11:48 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Fri, 7 Apr 2023 07:02:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Thu, Apr 06, 2023 at 10:20:09AM -0700, Guenter Roeck wrote:
> > > > > Hi,
> > > > >
> > > > > On Mon, Mar 27, 2023 at 12:05:33PM +0800, Xuan Zhuo wrote:
> > > > > > Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> > > > > > caller can do dma operation in advance. The purpose is to keep memory
> > > > > > mapped across multiple add/get buf operations.
> > > > > >
> > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > Acked-by: Jason Wang <jasowang@redhat.com>
> > > > >
> > > > > On sparc64, this patch results in
> > > > >
> > > > > [    4.364643] Unable to handle kernel NULL pointer dereference
> > > > > [    4.365157] tsk->{mm,active_mm}->context = 0000000000000000
> > > > > [    4.365394] tsk->{mm,active_mm}->pgd = fffff80000402000
> > > > > [    4.365611]               \|/ ____ \|/
> > > > > [    4.365611]               "@'/ .. \`@"
> > > > > [    4.365611]               /_| \__/ |_\
> > > > > [    4.365611]                  \__U_/
> > > > > [    4.366165] swapper/0(1): Oops [#1]
> > > > > [    4.366630] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G                 N 6.3.0-rc5-next-20230405 #1
> > > > > [    4.367121] TSTATE: 0000004411001606 TPC: 00000000004375c0 TNPC: 00000000004375c4 Y: 00000002    Tainted: G                 N
> > > > > [    4.367548] TPC: <dma_4u_supported+0x20/0x40>
> > > > > [    4.368111] g0: 0000000000000000 g1: 0000000001a93a50 g2: 0000000000000000 g3: 0000000001a96170
> > > > > [    4.368439] g4: fffff8000416e9a0 g5: fffff8001dc98000 g6: fffff80004170000 g7: 0000000000000005
> > > > > [    4.368769] o0: 0000000000000000 o1: ffffffffffffffff o2: 0000000000000001 o3: fffff800040b78d8
> > > > > [    4.369095] o4: 0000000000000000 o5: 0000000000000000 sp: fffff80004172d51 ret_pc: 00000000004375ac
> > > > > [    4.369934] RPC: <dma_4u_supported+0xc/0x40>
> > > > > [    4.370160] l0: 0000000000000002 l1: 0000000000000002 l2: 0000000000000000 l3: 00000000ffffffff
> > > > > [    4.370493] l4: 0000000000000001 l5: 000000000119d2b0 l6: fffff8000416e9a0 l7: 00000000018df000
> > > > > [    4.370820] i0: 0000000000000001 i1: ffffffffffffffff i2: fffff80004657758 i3: 000000000125ac00
> > > > > [    4.371145] i4: 0000000002362400 i5: 0000000000000000 i6: fffff80004172e01 i7: 000000000050e288
> > > > > [    4.371473] I7: <dma_set_mask+0x28/0x80>
> > > > > [    4.371683] Call Trace:
> > > > > [    4.371864] [<000000000050e288>] dma_set_mask+0x28/0x80
> > > > > [    4.372123] [<0000000000a83234>] virtio_dev_probe+0x14/0x400
> > > > > [    4.372348] [<0000000000ac7a18>] really_probe+0xb8/0x340
> > > > > [    4.372555] [<0000000000ac7d64>] driver_probe_device+0x24/0x120
> > > > > [    4.372794] [<0000000000ac8010>] __driver_attach+0x90/0x1a0
> > > > > [    4.373012] [<0000000000ac5b4c>] bus_for_each_dev+0x4c/0xa0
> > > > > [    4.373226] [<0000000000ac6d80>] bus_add_driver+0x140/0x1e0
> > > > > [    4.373440] [<0000000000ac8d94>] driver_register+0x74/0x120
> > > > > [    4.373653] [<0000000001b2a690>] virtio_net_driver_init+0x74/0xa8
> > > > > [    4.373886] [<0000000000427ee0>] do_one_initcall+0x60/0x340
> > > > > [    4.374099] [<0000000001b02f50>] kernel_init_freeable+0x1bc/0x228
> > > > > [    4.374330] [<0000000000f37264>] kernel_init+0x18/0x134
> > > > > [    4.374534] [<00000000004060e8>] ret_from_fork+0x1c/0x2c
> > > > > [    4.374738] [<0000000000000000>] 0x0
> > > > > [    4.374981] Disabling lock debugging due to kernel taint
> > > > > [    4.375237] Caller[000000000050e288]: dma_set_mask+0x28/0x80
> > > > > [    4.375477] Caller[0000000000a83234]: virtio_dev_probe+0x14/0x400
> > > > > [    4.375704] Caller[0000000000ac7a18]: really_probe+0xb8/0x340
> > > > > [    4.375916] Caller[0000000000ac7d64]: driver_probe_device+0x24/0x120
> > > > > [    4.376146] Caller[0000000000ac8010]: __driver_attach+0x90/0x1a0
> > > > > [    4.376365] Caller[0000000000ac5b4c]: bus_for_each_dev+0x4c/0xa0
> > > > > [    4.376583] Caller[0000000000ac6d80]: bus_add_driver+0x140/0x1e0
> > > > > [    4.376805] Caller[0000000000ac8d94]: driver_register+0x74/0x120
> > > > > [    4.377025] Caller[0000000001b2a690]: virtio_net_driver_init+0x74/0xa8
> > > > > [    4.377262] Caller[0000000000427ee0]: do_one_initcall+0x60/0x340
> > > > > [    4.377480] Caller[0000000001b02f50]: kernel_init_freeable+0x1bc/0x228
> > > > > [    4.377721] Caller[0000000000f37264]: kernel_init+0x18/0x134
> > > > > [    4.377930] Caller[00000000004060e8]: ret_from_fork+0x1c/0x2c
> > > > > [    4.378140] Caller[0000000000000000]: 0x0
> > > > > [    4.378309] Instruction DUMP:
> > > > > [    4.378339]  80a22000
> > > > > [    4.378556]  12400006
> > > > > [    4.378654]  b0102001
> > > > > [    4.378746] <c2076658>
> > > > > [    4.378838]  b0102000
> > > > > [    4.378930]  80a04019
> > > > > [    4.379022]  b1653001
> > > > > [    4.379115]  81cfe008
> > > > > [    4.379507]  913a2000
> > > > > [    4.379617]
> > > > > [    4.380504] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
> > > > >
> > > > > Reverting it fixes the problem. Bisect log attached.
> > > > >
> > > > > The reason is that dma_supported() calls dma_4u_supported(), which
> > > > > expects that dev->archdata.iommu has been initialized. However,
> > > > > this is not the case for the virtio device. Result is a null pointer
> > > > > dereference in dma_4u_supported().
> > > > >
> > > > > static int dma_4u_supported(struct device *dev, u64 device_mask)
> > > > > {
> > > > >         struct iommu *iommu = dev->archdata.iommu;
> > > > >
> > > > >         if (ali_sound_dma_hack(dev, device_mask))
> > > > >                 return 1;
> > > > >
> > > > >         if (device_mask < iommu->dma_addr_mask)
> > > > >                       ^^^^^^^^^^^^^^^^^^^^ Crash location
> > > > >                 return 0;
> > > > >         return 1;
> > > > > }
> > > > >
> > > > > Guenter
> > > >
> > > > This is from the unconditional call to dma_set_mask_and_coherent, right?
> > >
> > > Maybe not.
> > >
> > > The problem is that the dma_4u_supported() will be called in dma_supported().
> > > The premise of this function is that IOMMU has been initialized.
> > >
> > > We hope to turn virtio dev into a direct dma device by dev->ops is null.
> > >
> > > The first step in DMA frame is obtaining ops. In many cases, get_arch_dma_ops()
> > > returns null. When ops is null, dma frame will use direct dma.
> > >
> > > static inline const struct dma_map_ops *get_dma_ops(struct device *dev)
> > > {
> > >         if (dev->dma_ops)
> > >                 return dev->dma_ops;
> > >         return get_arch_dma_ops();
> > > }
> > >
> > > But rethink, this is unreliable. Some platforms have returned their own ops,
> > > including X86.
> > >
> > > Because the priority of dev->dma_ops is higher than get_arch_dma_ops(), we
> > > should not let dev->dma_ops be null. We should define a set of dma_ops to
> > > actively implement direct dMA.
> >
> > Then this will duplicate with existing DMA API helpers. Could we tweak
> > the sparc or introduce a new flag for the device structure then the
> > DMA API knows it needs to use direct mapping?
> 
> If we add a new flag will better. This is not the problem of sparc, x86 may
> occurs this also.
> 
> Thanks.

Looks like I should drop this from my tree for now while you guys
work out a solution?

> 
> >
> > Adding Christoph for more comments.
> >
> > >
> > >
> > > >
> > > > Xuan Zhuo, I feel there should an explanation in the commit log
> > > > about making this unconditional call. Why are you making it
> > > > in probe?
> > > >
> > > > I note that different virtio transports handle this differently.
> > > > And looking at this more I feel this should maybe be reverted for now:
> > > >
> > > > Your patch does:
> > > >
> > > > @@ -243,6 +244,11 @@ static int virtio_dev_probe(struct device *_d)
> > > >         u64 driver_features;
> > > >         u64 driver_features_legacy;
> > > >
> > > > +       _d->dma_mask = &_d->coherent_dma_mask;
> > > > +       err = dma_set_mask_and_coherent(_d, DMA_BIT_MASK(64));
> > > > +       if (err)
> > > > +               return err;
> > > > +
> > > >         /* We have a driver! */
> > > >
> > > >
> > > > but for example virtio PCI has a 32 bit fallback.
> > > >
> > > > For example Virtio legacy also can't access 64 bit addresses at all,
> > > > so it does:
> > > >
> > > >         rc = dma_set_mask(&pci_dev->dev, DMA_BIT_MASK(64));
> > >
> > > dma_set_mask() will also call dma_supported().
> > >
> > >
> > > >         if (rc) {
> > > >                 rc = dma_set_mask_and_coherent(&pci_dev->dev, DMA_BIT_MASK(32));
> > > >         } else {
> > > >                 /*
> > > >                  * The virtio ring base address is expressed as a 32-bit PFN,
> > > >                  * with a page size of 1 << VIRTIO_PCI_QUEUE_ADDR_SHIFT.
> > > >                  */
> > > >                 dma_set_coherent_mask(&pci_dev->dev,
> > > >                                 DMA_BIT_MASK(32 + VIRTIO_PCI_QUEUE_ADDR_SHIFT));
> > > >         }
> > > >
> > > >
> > > > mmio tries a fallback to 32 bit if 64 bit fails which does not make
> > > > sense to me, but maybe I am wrong.
> > > >
> > > >
> > >
> > > These physical devices have IOMMU, so it is normal, but Virtio Dev does not have
> > > any physical IOMMU, so there are problems. I borrow from VDPA. So I think
> > > there are similar problems with VDPA.
> >
> > Yes.
> >
> > Thanks
> >
> > >
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > >
> > > >
> > > >
> > > > > ---
> > > > > # bad: [8417c8f5007bf4567ccffda850a3157c7d905f67] Add linux-next specific files for 20230405
> > > > > # good: [7e364e56293bb98cae1b55fd835f5991c4e96e7d] Linux 6.3-rc5
> > > > > git bisect start 'HEAD' 'v6.3-rc5'
> > > > > # good: [699c146eb5a03458f29e94cfde4d7dd7a36deeb4] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
> > > > > git bisect good 699c146eb5a03458f29e94cfde4d7dd7a36deeb4
> > > > > # good: [efe74e6476a9f04263288009910f07a26597386f] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-dt.git
> > > > > git bisect good efe74e6476a9f04263288009910f07a26597386f
> > > > > # good: [24be607eedb226c1627973190a0b65cab39440b9] Merge branch 'char-misc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
> > > > > git bisect good 24be607eedb226c1627973190a0b65cab39440b9
> > > > > # good: [5213285e3f212cf266c085c1c11041adda2bc63f] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git
> > > > > git bisect good 5213285e3f212cf266c085c1c11041adda2bc63f
> > > > > # bad: [e8f5d7e1787104da5977773ba5b3e1d502fdb9e3] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm.git
> > > > > git bisect bad e8f5d7e1787104da5977773ba5b3e1d502fdb9e3
> > > > > # bad: [c08b5ad3300122790cef1bf7c1f51c554c778e4d] Merge branch 'gpio/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux.git
> > > > > git bisect bad c08b5ad3300122790cef1bf7c1f51c554c778e4d
> > > > > # good: [4ea0c97776bf8c63805eb0f8182d9c20072219d9] gpiolib: Check array_info for NULL only once in gpiod_get_array()
> > > > > git bisect good 4ea0c97776bf8c63805eb0f8182d9c20072219d9
> > > > > # good: [b8e4bb0f05bef8334e689618c75cfed122f3a292] vdpa_sim: use kthread worker
> > > > > git bisect good b8e4bb0f05bef8334e689618c75cfed122f3a292
> > > > > # bad: [ac1fa106f52a3ea2d2718a0de7d532d3d6c03e4e] Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
> > > > > git bisect bad ac1fa106f52a3ea2d2718a0de7d532d3d6c03e4e
> > > > > # good: [56c8cf37fb0800ad66c926eb3d044e50691afa6e] virtio_ring: packed: separate dma codes
> > > > > git bisect good 56c8cf37fb0800ad66c926eb3d044e50691afa6e
> > > > > # good: [f19986b93ab7aa23f76dedf9fd8657f865324f78] virtio_ring: update document for virtqueue_add_*
> > > > > git bisect good f19986b93ab7aa23f76dedf9fd8657f865324f78
> > > > > # bad: [60602b367dc928e2b3d6c1f21df5d05f90e37fa3] virtio_ring: correct the expression of the description of virtqueue_resize()
> > > > > git bisect bad 60602b367dc928e2b3d6c1f21df5d05f90e37fa3
> > > > > # bad: [3a06353479111e1c9e072825bb0d0730e3a0f4e7] virtio_ring: introduce virtqueue_dma_dev()
> > > > > git bisect bad 3a06353479111e1c9e072825bb0d0730e3a0f4e7
> > > > > # first bad commit: [3a06353479111e1c9e072825bb0d0730e3a0f4e7] virtio_ring: introduce virtqueue_dma_dev()
> > > >
> > >
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-10  6:40             ` Michael S. Tsirkin
@ 2023-04-10  6:42               ` Xuan Zhuo
  0 siblings, 0 replies; 57+ messages in thread
From: Xuan Zhuo @ 2023-04-10  6:42 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Christoph Hellwig, Guenter Roeck, virtualization

On Mon, 10 Apr 2023 02:40:58 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Mon, Apr 10, 2023 at 02:03:37PM +0800, Xuan Zhuo wrote:
> > On Mon, 10 Apr 2023 13:14:13 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Mon, Apr 10, 2023 at 11:48 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Fri, 7 Apr 2023 07:02:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > On Thu, Apr 06, 2023 at 10:20:09AM -0700, Guenter Roeck wrote:
> > > > > > Hi,
> > > > > >
> > > > > > On Mon, Mar 27, 2023 at 12:05:33PM +0800, Xuan Zhuo wrote:
> > > > > > > Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> > > > > > > caller can do dma operation in advance. The purpose is to keep memory
> > > > > > > mapped across multiple add/get buf operations.
> > > > > > >
> > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > Acked-by: Jason Wang <jasowang@redhat.com>
> > > > > >
> > > > > > On sparc64, this patch results in
> > > > > >
> > > > > > [    4.364643] Unable to handle kernel NULL pointer dereference
> > > > > > [    4.365157] tsk->{mm,active_mm}->context = 0000000000000000
> > > > > > [    4.365394] tsk->{mm,active_mm}->pgd = fffff80000402000
> > > > > > [    4.365611]               \|/ ____ \|/
> > > > > > [    4.365611]               "@'/ .. \`@"
> > > > > > [    4.365611]               /_| \__/ |_\
> > > > > > [    4.365611]                  \__U_/
> > > > > > [    4.366165] swapper/0(1): Oops [#1]
> > > > > > [    4.366630] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G                 N 6.3.0-rc5-next-20230405 #1
> > > > > > [    4.367121] TSTATE: 0000004411001606 TPC: 00000000004375c0 TNPC: 00000000004375c4 Y: 00000002    Tainted: G                 N
> > > > > > [    4.367548] TPC: <dma_4u_supported+0x20/0x40>
> > > > > > [    4.368111] g0: 0000000000000000 g1: 0000000001a93a50 g2: 0000000000000000 g3: 0000000001a96170
> > > > > > [    4.368439] g4: fffff8000416e9a0 g5: fffff8001dc98000 g6: fffff80004170000 g7: 0000000000000005
> > > > > > [    4.368769] o0: 0000000000000000 o1: ffffffffffffffff o2: 0000000000000001 o3: fffff800040b78d8
> > > > > > [    4.369095] o4: 0000000000000000 o5: 0000000000000000 sp: fffff80004172d51 ret_pc: 00000000004375ac
> > > > > > [    4.369934] RPC: <dma_4u_supported+0xc/0x40>
> > > > > > [    4.370160] l0: 0000000000000002 l1: 0000000000000002 l2: 0000000000000000 l3: 00000000ffffffff
> > > > > > [    4.370493] l4: 0000000000000001 l5: 000000000119d2b0 l6: fffff8000416e9a0 l7: 00000000018df000
> > > > > > [    4.370820] i0: 0000000000000001 i1: ffffffffffffffff i2: fffff80004657758 i3: 000000000125ac00
> > > > > > [    4.371145] i4: 0000000002362400 i5: 0000000000000000 i6: fffff80004172e01 i7: 000000000050e288
> > > > > > [    4.371473] I7: <dma_set_mask+0x28/0x80>
> > > > > > [    4.371683] Call Trace:
> > > > > > [    4.371864] [<000000000050e288>] dma_set_mask+0x28/0x80
> > > > > > [    4.372123] [<0000000000a83234>] virtio_dev_probe+0x14/0x400
> > > > > > [    4.372348] [<0000000000ac7a18>] really_probe+0xb8/0x340
> > > > > > [    4.372555] [<0000000000ac7d64>] driver_probe_device+0x24/0x120
> > > > > > [    4.372794] [<0000000000ac8010>] __driver_attach+0x90/0x1a0
> > > > > > [    4.373012] [<0000000000ac5b4c>] bus_for_each_dev+0x4c/0xa0
> > > > > > [    4.373226] [<0000000000ac6d80>] bus_add_driver+0x140/0x1e0
> > > > > > [    4.373440] [<0000000000ac8d94>] driver_register+0x74/0x120
> > > > > > [    4.373653] [<0000000001b2a690>] virtio_net_driver_init+0x74/0xa8
> > > > > > [    4.373886] [<0000000000427ee0>] do_one_initcall+0x60/0x340
> > > > > > [    4.374099] [<0000000001b02f50>] kernel_init_freeable+0x1bc/0x228
> > > > > > [    4.374330] [<0000000000f37264>] kernel_init+0x18/0x134
> > > > > > [    4.374534] [<00000000004060e8>] ret_from_fork+0x1c/0x2c
> > > > > > [    4.374738] [<0000000000000000>] 0x0
> > > > > > [    4.374981] Disabling lock debugging due to kernel taint
> > > > > > [    4.375237] Caller[000000000050e288]: dma_set_mask+0x28/0x80
> > > > > > [    4.375477] Caller[0000000000a83234]: virtio_dev_probe+0x14/0x400
> > > > > > [    4.375704] Caller[0000000000ac7a18]: really_probe+0xb8/0x340
> > > > > > [    4.375916] Caller[0000000000ac7d64]: driver_probe_device+0x24/0x120
> > > > > > [    4.376146] Caller[0000000000ac8010]: __driver_attach+0x90/0x1a0
> > > > > > [    4.376365] Caller[0000000000ac5b4c]: bus_for_each_dev+0x4c/0xa0
> > > > > > [    4.376583] Caller[0000000000ac6d80]: bus_add_driver+0x140/0x1e0
> > > > > > [    4.376805] Caller[0000000000ac8d94]: driver_register+0x74/0x120
> > > > > > [    4.377025] Caller[0000000001b2a690]: virtio_net_driver_init+0x74/0xa8
> > > > > > [    4.377262] Caller[0000000000427ee0]: do_one_initcall+0x60/0x340
> > > > > > [    4.377480] Caller[0000000001b02f50]: kernel_init_freeable+0x1bc/0x228
> > > > > > [    4.377721] Caller[0000000000f37264]: kernel_init+0x18/0x134
> > > > > > [    4.377930] Caller[00000000004060e8]: ret_from_fork+0x1c/0x2c
> > > > > > [    4.378140] Caller[0000000000000000]: 0x0
> > > > > > [    4.378309] Instruction DUMP:
> > > > > > [    4.378339]  80a22000
> > > > > > [    4.378556]  12400006
> > > > > > [    4.378654]  b0102001
> > > > > > [    4.378746] <c2076658>
> > > > > > [    4.378838]  b0102000
> > > > > > [    4.378930]  80a04019
> > > > > > [    4.379022]  b1653001
> > > > > > [    4.379115]  81cfe008
> > > > > > [    4.379507]  913a2000
> > > > > > [    4.379617]
> > > > > > [    4.380504] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
> > > > > >
> > > > > > Reverting it fixes the problem. Bisect log attached.
> > > > > >
> > > > > > The reason is that dma_supported() calls dma_4u_supported(), which
> > > > > > expects that dev->archdata.iommu has been initialized. However,
> > > > > > this is not the case for the virtio device. Result is a null pointer
> > > > > > dereference in dma_4u_supported().
> > > > > >
> > > > > > static int dma_4u_supported(struct device *dev, u64 device_mask)
> > > > > > {
> > > > > >         struct iommu *iommu = dev->archdata.iommu;
> > > > > >
> > > > > >         if (ali_sound_dma_hack(dev, device_mask))
> > > > > >                 return 1;
> > > > > >
> > > > > >         if (device_mask < iommu->dma_addr_mask)
> > > > > >                       ^^^^^^^^^^^^^^^^^^^^ Crash location
> > > > > >                 return 0;
> > > > > >         return 1;
> > > > > > }
> > > > > >
> > > > > > Guenter
> > > > >
> > > > > This is from the unconditional call to dma_set_mask_and_coherent, right?
> > > >
> > > > Maybe not.
> > > >
> > > > The problem is that the dma_4u_supported() will be called in dma_supported().
> > > > The premise of this function is that IOMMU has been initialized.
> > > >
> > > > We hope to turn virtio dev into a direct dma device by dev->ops is null.
> > > >
> > > > The first step in DMA frame is obtaining ops. In many cases, get_arch_dma_ops()
> > > > returns null. When ops is null, dma frame will use direct dma.
> > > >
> > > > static inline const struct dma_map_ops *get_dma_ops(struct device *dev)
> > > > {
> > > >         if (dev->dma_ops)
> > > >                 return dev->dma_ops;
> > > >         return get_arch_dma_ops();
> > > > }
> > > >
> > > > But rethink, this is unreliable. Some platforms have returned their own ops,
> > > > including X86.
> > > >
> > > > Because the priority of dev->dma_ops is higher than get_arch_dma_ops(), we
> > > > should not let dev->dma_ops be null. We should define a set of dma_ops to
> > > > actively implement direct dMA.
> > >
> > > Then this will duplicate with existing DMA API helpers. Could we tweak
> > > the sparc or introduce a new flag for the device structure then the
> > > DMA API knows it needs to use direct mapping?
> >
> > If we add a new flag will better. This is not the problem of sparc, x86 may
> > occurs this also.
> >
> > Thanks.
>
> Looks like I should drop this from my tree for now while you guys
> work out a solution?


Yes

Thanks.


>
> >
> > >
> > > Adding Christoph for more comments.
> > >
> > > >
> > > >
> > > > >
> > > > > Xuan Zhuo, I feel there should an explanation in the commit log
> > > > > about making this unconditional call. Why are you making it
> > > > > in probe?
> > > > >
> > > > > I note that different virtio transports handle this differently.
> > > > > And looking at this more I feel this should maybe be reverted for now:
> > > > >
> > > > > Your patch does:
> > > > >
> > > > > @@ -243,6 +244,11 @@ static int virtio_dev_probe(struct device *_d)
> > > > >         u64 driver_features;
> > > > >         u64 driver_features_legacy;
> > > > >
> > > > > +       _d->dma_mask = &_d->coherent_dma_mask;
> > > > > +       err = dma_set_mask_and_coherent(_d, DMA_BIT_MASK(64));
> > > > > +       if (err)
> > > > > +               return err;
> > > > > +
> > > > >         /* We have a driver! */
> > > > >
> > > > >
> > > > > but for example virtio PCI has a 32 bit fallback.
> > > > >
> > > > > For example Virtio legacy also can't access 64 bit addresses at all,
> > > > > so it does:
> > > > >
> > > > >         rc = dma_set_mask(&pci_dev->dev, DMA_BIT_MASK(64));
> > > >
> > > > dma_set_mask() will also call dma_supported().
> > > >
> > > >
> > > > >         if (rc) {
> > > > >                 rc = dma_set_mask_and_coherent(&pci_dev->dev, DMA_BIT_MASK(32));
> > > > >         } else {
> > > > >                 /*
> > > > >                  * The virtio ring base address is expressed as a 32-bit PFN,
> > > > >                  * with a page size of 1 << VIRTIO_PCI_QUEUE_ADDR_SHIFT.
> > > > >                  */
> > > > >                 dma_set_coherent_mask(&pci_dev->dev,
> > > > >                                 DMA_BIT_MASK(32 + VIRTIO_PCI_QUEUE_ADDR_SHIFT));
> > > > >         }
> > > > >
> > > > >
> > > > > mmio tries a fallback to 32 bit if 64 bit fails which does not make
> > > > > sense to me, but maybe I am wrong.
> > > > >
> > > > >
> > > >
> > > > These physical devices have IOMMU, so it is normal, but Virtio Dev does not have
> > > > any physical IOMMU, so there are problems. I borrow from VDPA. So I think
> > > > there are similar problems with VDPA.
> > >
> > > Yes.
> > >
> > > Thanks
> > >
> > > >
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > ---
> > > > > > # bad: [8417c8f5007bf4567ccffda850a3157c7d905f67] Add linux-next specific files for 20230405
> > > > > > # good: [7e364e56293bb98cae1b55fd835f5991c4e96e7d] Linux 6.3-rc5
> > > > > > git bisect start 'HEAD' 'v6.3-rc5'
> > > > > > # good: [699c146eb5a03458f29e94cfde4d7dd7a36deeb4] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
> > > > > > git bisect good 699c146eb5a03458f29e94cfde4d7dd7a36deeb4
> > > > > > # good: [efe74e6476a9f04263288009910f07a26597386f] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-dt.git
> > > > > > git bisect good efe74e6476a9f04263288009910f07a26597386f
> > > > > > # good: [24be607eedb226c1627973190a0b65cab39440b9] Merge branch 'char-misc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
> > > > > > git bisect good 24be607eedb226c1627973190a0b65cab39440b9
> > > > > > # good: [5213285e3f212cf266c085c1c11041adda2bc63f] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git
> > > > > > git bisect good 5213285e3f212cf266c085c1c11041adda2bc63f
> > > > > > # bad: [e8f5d7e1787104da5977773ba5b3e1d502fdb9e3] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm.git
> > > > > > git bisect bad e8f5d7e1787104da5977773ba5b3e1d502fdb9e3
> > > > > > # bad: [c08b5ad3300122790cef1bf7c1f51c554c778e4d] Merge branch 'gpio/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux.git
> > > > > > git bisect bad c08b5ad3300122790cef1bf7c1f51c554c778e4d
> > > > > > # good: [4ea0c97776bf8c63805eb0f8182d9c20072219d9] gpiolib: Check array_info for NULL only once in gpiod_get_array()
> > > > > > git bisect good 4ea0c97776bf8c63805eb0f8182d9c20072219d9
> > > > > > # good: [b8e4bb0f05bef8334e689618c75cfed122f3a292] vdpa_sim: use kthread worker
> > > > > > git bisect good b8e4bb0f05bef8334e689618c75cfed122f3a292
> > > > > > # bad: [ac1fa106f52a3ea2d2718a0de7d532d3d6c03e4e] Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
> > > > > > git bisect bad ac1fa106f52a3ea2d2718a0de7d532d3d6c03e4e
> > > > > > # good: [56c8cf37fb0800ad66c926eb3d044e50691afa6e] virtio_ring: packed: separate dma codes
> > > > > > git bisect good 56c8cf37fb0800ad66c926eb3d044e50691afa6e
> > > > > > # good: [f19986b93ab7aa23f76dedf9fd8657f865324f78] virtio_ring: update document for virtqueue_add_*
> > > > > > git bisect good f19986b93ab7aa23f76dedf9fd8657f865324f78
> > > > > > # bad: [60602b367dc928e2b3d6c1f21df5d05f90e37fa3] virtio_ring: correct the expression of the description of virtqueue_resize()
> > > > > > git bisect bad 60602b367dc928e2b3d6c1f21df5d05f90e37fa3
> > > > > > # bad: [3a06353479111e1c9e072825bb0d0730e3a0f4e7] virtio_ring: introduce virtqueue_dma_dev()
> > > > > > git bisect bad 3a06353479111e1c9e072825bb0d0730e3a0f4e7
> > > > > > # first bad commit: [3a06353479111e1c9e072825bb0d0730e3a0f4e7] virtio_ring: introduce virtqueue_dma_dev()
> > > > >
> > > >
> > >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-10  5:14         ` Jason Wang
  2023-04-10  6:03           ` Xuan Zhuo
@ 2023-04-10 15:27           ` Christoph Hellwig
  2023-04-11  1:56             ` Xuan Zhuo
  2023-04-11  7:12           ` Xuan Zhuo
  2 siblings, 1 reply; 57+ messages in thread
From: Christoph Hellwig @ 2023-04-10 15:27 UTC (permalink / raw)
  To: Jason Wang
  Cc: Christoph Hellwig, Xuan Zhuo, virtualization, Guenter Roeck,
	Michael S. Tsirkin

On Mon, Apr 10, 2023 at 01:14:13PM +0800, Jason Wang wrote:
> > But rethink, this is unreliable. Some platforms have returned their own ops,
> > including X86.
> >
> > Because the priority of dev->dma_ops is higher than get_arch_dma_ops(), we
> > should not let dev->dma_ops be null. We should define a set of dma_ops to
> > actively implement direct dMA.
> 
> Then this will duplicate with existing DMA API helpers. Could we tweak
> the sparc or introduce a new flag for the device structure then the
> DMA API knows it needs to use direct mapping?
> 
> Adding Christoph for more comments.

Code outside of the core kernel/dma/ code has no business doing
anything with the dma_ops.  WTF is going on?
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-10 15:27           ` Christoph Hellwig
@ 2023-04-11  1:56             ` Xuan Zhuo
  2023-04-11  2:09               ` Jason Wang
  2023-04-11  3:26               ` Christoph Hellwig
  0 siblings, 2 replies; 57+ messages in thread
From: Xuan Zhuo @ 2023-04-11  1:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, virtualization, Guenter Roeck, Michael S. Tsirkin

On Mon, 10 Apr 2023 08:27:14 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> On Mon, Apr 10, 2023 at 01:14:13PM +0800, Jason Wang wrote:
> > > But rethink, this is unreliable. Some platforms have returned their own ops,
> > > including X86.
> > >
> > > Because the priority of dev->dma_ops is higher than get_arch_dma_ops(), we
> > > should not let dev->dma_ops be null. We should define a set of dma_ops to
> > > actively implement direct dMA.
> >
> > Then this will duplicate with existing DMA API helpers. Could we tweak
> > the sparc or introduce a new flag for the device structure then the
> > DMA API knows it needs to use direct mapping?
> >
> > Adding Christoph for more comments.
>
> Code outside of the core kernel/dma/ code has no business doing
> anything with the dma_ops.

Do you mean we should not change the dma_ops from the outside of the core
kernel/dma/?

How about adding one new "dma_direct" to struct devive?

--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -88,6 +88,9 @@ struct dma_map_ops {

 static inline const struct dma_map_ops *get_dma_ops(struct device *dev)
 {
+       if (dev->dma_direct)
+               return NULL;
+
        if (dev->dma_ops)
                return dev->dma_ops;
        return get_arch_dma_ops();


Thanks.



> WTF is going on?
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-11  1:56             ` Xuan Zhuo
@ 2023-04-11  2:09               ` Jason Wang
  2023-04-11  3:31                 ` Christoph Hellwig
  2023-04-11  3:26               ` Christoph Hellwig
  1 sibling, 1 reply; 57+ messages in thread
From: Jason Wang @ 2023-04-11  2:09 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Christoph Hellwig, virtualization, Guenter Roeck, Michael S. Tsirkin

On Tue, Apr 11, 2023 at 10:01 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Mon, 10 Apr 2023 08:27:14 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > On Mon, Apr 10, 2023 at 01:14:13PM +0800, Jason Wang wrote:
> > > > But rethink, this is unreliable. Some platforms have returned their own ops,
> > > > including X86.
> > > >
> > > > Because the priority of dev->dma_ops is higher than get_arch_dma_ops(), we
> > > > should not let dev->dma_ops be null. We should define a set of dma_ops to
> > > > actively implement direct dMA.
> > >
> > > Then this will duplicate with existing DMA API helpers. Could we tweak
> > > the sparc or introduce a new flag for the device structure then the
> > > DMA API knows it needs to use direct mapping?
> > >
> > > Adding Christoph for more comments.
> >
> > Code outside of the core kernel/dma/ code has no business doing
> > anything with the dma_ops.
>
> Do you mean we should not change the dma_ops from the outside of the core
> kernel/dma/?
>
> How about adding one new "dma_direct" to struct devive?
>
> --- a/include/linux/dma-map-ops.h
> +++ b/include/linux/dma-map-ops.h
> @@ -88,6 +88,9 @@ struct dma_map_ops {
>
>  static inline const struct dma_map_ops *get_dma_ops(struct device *dev)
>  {
> +       if (dev->dma_direct)
> +               return NULL;
> +
>         if (dev->dma_ops)
>                 return dev->dma_ops;
>         return get_arch_dma_ops();
>
>
> Thanks.
>
>
>
> > WTF is going on?

We want to support AF_XDP for virtio-net. It means AF_XDP needs to
know the dma device to perform DMA mapping. So we introduce a helper
to expose the dma dev of the virtio device.

This works fine as long as VIRTIO_F_ACCESS_PLATFORM is negotiated. But
when it is not, the virtio driver needs to use a physical address so
we want to expose the virtio device without dma_ops in the hope that
it will go for direct mapping where the physical address is used. But
it may not work on some specific setups (arches that assume an IOMMU
or have arch dma ops).

Thanks

>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-11  1:56             ` Xuan Zhuo
  2023-04-11  2:09               ` Jason Wang
@ 2023-04-11  3:26               ` Christoph Hellwig
  2023-04-11  6:23                 ` Xuan Zhuo
  1 sibling, 1 reply; 57+ messages in thread
From: Christoph Hellwig @ 2023-04-11  3:26 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Christoph Hellwig, virtualization, Guenter Roeck, Michael S. Tsirkin

On Tue, Apr 11, 2023 at 09:56:19AM +0800, Xuan Zhuo wrote:
> Do you mean we should not change the dma_ops from the outside of the core
> kernel/dma/?

Basically, yes.  That plus probing for the IOMMU drivers.

> How about adding one new "dma_direct" to struct devive?

Why would we?

Can you please explain what you're even trying to do?  But checking
a dma_direct flag for sure isn't the solution.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-11  2:09               ` Jason Wang
@ 2023-04-11  3:31                 ` Christoph Hellwig
  2023-04-11  3:56                   ` Jason Wang
  0 siblings, 1 reply; 57+ messages in thread
From: Christoph Hellwig @ 2023-04-11  3:31 UTC (permalink / raw)
  To: Jason Wang
  Cc: Christoph Hellwig, Xuan Zhuo, virtualization, Guenter Roeck,
	Michael S. Tsirkin

On Tue, Apr 11, 2023 at 10:09:40AM +0800, Jason Wang wrote:
> We want to support AF_XDP for virtio-net. It means AF_XDP needs to
> know the dma device to perform DMA mapping. So we introduce a helper
> to expose the dma dev of the virtio device.

The whole virtio architecture is based around the core code doing
the DMA mapping.  I can't see how you can just export a helper to
expose the dma device.  You'd have to complete rework the layering
of the virtio code if you want to do it in the upper level drivers.
And why would you want to do this?  The low-level code is the only
piece that can actually know if you need to do a dma mapping.  All
the kernel subsystems that don't do it inside the low-level drivers
or helpers closely associtated are a giant and hard to fix map
(see usb for the prime exhibit).

So the first question is:  why do you want this for XF_ADP, and
the next question will be how to do that without making a complete
mess.

> This works fine as long as VIRTIO_F_ACCESS_PLATFORM is negotiated. But
> when it is not, the virtio driver needs to use a physical address so
> we want to expose the virtio device without dma_ops in the hope that
> it will go for direct mapping where the physical address is used. But
> it may not work on some specific setups (arches that assume an IOMMU
> or have arch dma ops).

The DMA device for virtio_pci is the underlying PCI device, always.
!VIRTIO_F_ACCESS_PLATFORM means there is no dma device at all.  Because
of all these things you can't just expose a pointer to the dma_device
as that is just a completely wrong way of thinking about the problem.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-11  3:31                 ` Christoph Hellwig
@ 2023-04-11  3:56                   ` Jason Wang
  2023-04-11  4:09                     ` Christoph Hellwig
  2023-04-11  6:11                     ` Xuan Zhuo
  0 siblings, 2 replies; 57+ messages in thread
From: Jason Wang @ 2023-04-11  3:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Xuan Zhuo, virtualization, Guenter Roeck, Michael S. Tsirkin

On Tue, Apr 11, 2023 at 11:31 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Tue, Apr 11, 2023 at 10:09:40AM +0800, Jason Wang wrote:
> > We want to support AF_XDP for virtio-net. It means AF_XDP needs to
> > know the dma device to perform DMA mapping. So we introduce a helper
> > to expose the dma dev of the virtio device.
>
> The whole virtio architecture is based around the core code doing
> the DMA mapping.  I can't see how you can just export a helper to
> expose the dma device.  You'd have to complete rework the layering
> of the virtio code if you want to do it in the upper level drivers.
> And why would you want to do this?  The low-level code is the only
> piece that can actually know if you need to do a dma mapping.  All
> the kernel subsystems that don't do it inside the low-level drivers
> or helpers closely associtated are a giant and hard to fix map
> (see usb for the prime exhibit).
>
> So the first question is:  why do you want this for XF_ADP,

Xuan, is it possible to set up the DMA mapping inside the virtio
driver itself? I vaguely remember at least the RX buffer mapping is
done by the driver. If this is true, we can avoid exposing DMA details
to the upper layer.

> and
> the next question will be how to do that without making a complete
> mess.
>
> > This works fine as long as VIRTIO_F_ACCESS_PLATFORM is negotiated. But
> > when it is not, the virtio driver needs to use a physical address so
> > we want to expose the virtio device without dma_ops in the hope that
> > it will go for direct mapping where the physical address is used. But
> > it may not work on some specific setups (arches that assume an IOMMU
> > or have arch dma ops).
>
> The DMA device for virtio_pci is the underlying PCI device, always.
> !VIRTIO_F_ACCESS_PLATFORM means there is no dma device at all.  Because
> of all these things you can't just expose a pointer to the dma_device
> as that is just a completely wrong way of thinking about the problem.

Ok, so if there's no DMA at all we should avoid using the DMA API
completely. This means we should check dma_dev against NULL in
virtio_has_dma_quirk().

Thanks

>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-11  3:56                   ` Jason Wang
@ 2023-04-11  4:09                     ` Christoph Hellwig
  2023-04-11  4:54                       ` Jason Wang
  2023-04-11  6:11                     ` Xuan Zhuo
  1 sibling, 1 reply; 57+ messages in thread
From: Christoph Hellwig @ 2023-04-11  4:09 UTC (permalink / raw)
  To: Jason Wang
  Cc: Christoph Hellwig, Xuan Zhuo, virtualization, Guenter Roeck,
	Michael S. Tsirkin

On Tue, Apr 11, 2023 at 11:56:47AM +0800, Jason Wang wrote:
> > The DMA device for virtio_pci is the underlying PCI device, always.
> > !VIRTIO_F_ACCESS_PLATFORM means there is no dma device at all.  Because
> > of all these things you can't just expose a pointer to the dma_device
> > as that is just a completely wrong way of thinking about the problem.
> 
> Ok, so if there's no DMA at all we should avoid using the DMA API
> completely. This means we should check dma_dev against NULL in
> virtio_has_dma_quirk().

No nee to add a check to virtio_has_dma_quirk.

But looking at virtio_has_dma_quirk shows that virtio-gpu is
pretty messed up already as well.

It can't really poke into the DMA details.  We'll need core virtio
helpers for allocating and syncing a sg_table instead and make
virtio_has_dma_quirk private to the core.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-11  4:09                     ` Christoph Hellwig
@ 2023-04-11  4:54                       ` Jason Wang
  2023-04-11  5:14                         ` Christoph Hellwig
  2023-04-11  8:55                         ` Gerd Hoffmann
  0 siblings, 2 replies; 57+ messages in thread
From: Jason Wang @ 2023-04-11  4:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Xuan Zhuo, virtualization, Guenter Roeck, Michael S. Tsirkin

On Tue, Apr 11, 2023 at 12:10 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Tue, Apr 11, 2023 at 11:56:47AM +0800, Jason Wang wrote:
> > > The DMA device for virtio_pci is the underlying PCI device, always.
> > > !VIRTIO_F_ACCESS_PLATFORM means there is no dma device at all.  Because
> > > of all these things you can't just expose a pointer to the dma_device
> > > as that is just a completely wrong way of thinking about the problem.
> >
> > Ok, so if there's no DMA at all we should avoid using the DMA API
> > completely. This means we should check dma_dev against NULL in
> > virtio_has_dma_quirk().
>
> No nee to add a check to virtio_has_dma_quirk.

Ok, just to clarify, I meant there could be a case where the virtqueue
is emulated by software, in this case we need check whether the
virtqueue has a dma device or not in vring_use_dma_api(). If not we
need return false.

>
> But looking at virtio_has_dma_quirk shows that virtio-gpu is
> pretty messed up already as well.
>
> It can't really poke into the DMA details.  We'll need core virtio
> helpers for allocating and syncing a sg_table instead and make
> virtio_has_dma_quirk private to the core.

Adding Gerd.

Thanks

>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-11  4:54                       ` Jason Wang
@ 2023-04-11  5:14                         ` Christoph Hellwig
  2023-04-11  5:19                           ` Jason Wang
  2023-04-11  8:55                         ` Gerd Hoffmann
  1 sibling, 1 reply; 57+ messages in thread
From: Christoph Hellwig @ 2023-04-11  5:14 UTC (permalink / raw)
  To: Jason Wang
  Cc: Xuan Zhuo, Michael S. Tsirkin, virtualization, Christoph Hellwig,
	Guenter Roeck

On Tue, Apr 11, 2023 at 12:54:46PM +0800, Jason Wang wrote:
> Ok, just to clarify, I meant there could be a case where the virtqueue
> is emulated by software, in this case we need check whether the
> virtqueue has a dma device or not in vring_use_dma_api(). If not we
> need return false.

Well, that's what vring_use_dma_api basically does.  Such devics
just should never have VIRTIO_F_ACCESS_PLATFORM set.  If there is
a really good reason for such a device to have VIRTIO_F_ACCESS_PLATFORM
set we'd need an extra quirk in vring_use_dma_api indeed.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-11  5:14                         ` Christoph Hellwig
@ 2023-04-11  5:19                           ` Jason Wang
  0 siblings, 0 replies; 57+ messages in thread
From: Jason Wang @ 2023-04-11  5:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Xuan Zhuo, virtualization, Guenter Roeck, Michael S. Tsirkin

On Tue, Apr 11, 2023 at 1:14 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Tue, Apr 11, 2023 at 12:54:46PM +0800, Jason Wang wrote:
> > Ok, just to clarify, I meant there could be a case where the virtqueue
> > is emulated by software, in this case we need check whether the
> > virtqueue has a dma device or not in vring_use_dma_api(). If not we
> > need return false.
>
> Well, that's what vring_use_dma_api basically does.  Such devics
> just should never have VIRTIO_F_ACCESS_PLATFORM set.  If there is
> a really good reason for such a device to have VIRTIO_F_ACCESS_PLATFORM
> set we'd need an extra quirk in vring_use_dma_api indeed.
>

This is useful for some vDPA devices where the control virtqueue is
emulated by the vDPA parent (who will decode the control virtqueue
commands to vendor specific control commands).

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-11  3:56                   ` Jason Wang
  2023-04-11  4:09                     ` Christoph Hellwig
@ 2023-04-11  6:11                     ` Xuan Zhuo
  2023-04-11  6:20                       ` Christoph Hellwig
  1 sibling, 1 reply; 57+ messages in thread
From: Xuan Zhuo @ 2023-04-11  6:11 UTC (permalink / raw)
  To: Jason Wang
  Cc: Christoph Hellwig, Michael S. Tsirkin, Guenter Roeck, virtualization

On Tue, 11 Apr 2023 11:56:47 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Tue, Apr 11, 2023 at 11:31 AM Christoph Hellwig <hch@infradead.org> wrote:
> >
> > On Tue, Apr 11, 2023 at 10:09:40AM +0800, Jason Wang wrote:
> > > We want to support AF_XDP for virtio-net. It means AF_XDP needs to
> > > know the dma device to perform DMA mapping. So we introduce a helper
> > > to expose the dma dev of the virtio device.
> >
> > The whole virtio architecture is based around the core code doing
> > the DMA mapping.  I can't see how you can just export a helper to
> > expose the dma device.  You'd have to complete rework the layering
> > of the virtio code if you want to do it in the upper level drivers.
> > And why would you want to do this?  The low-level code is the only
> > piece that can actually know if you need to do a dma mapping.  All
> > the kernel subsystems that don't do it inside the low-level drivers
> > or helpers closely associtated are a giant and hard to fix map
> > (see usb for the prime exhibit).
> >
> > So the first question is:  why do you want this for XF_ADP,
>
> Xuan, is it possible to set up the DMA mapping inside the virtio
> driver itself? I vaguely remember at least the RX buffer mapping is
> done by the driver. If this is true, we can avoid exposing DMA details
> to the upper layer.


NO, all dma maping is done inside xdp socket. That is done
when setup.

When adding to RX Ring, xdp socket will call DMA SYNC.


>
> > and
> > the next question will be how to do that without making a complete
> > mess.
> >
> > > This works fine as long as VIRTIO_F_ACCESS_PLATFORM is negotiated. But
> > > when it is not, the virtio driver needs to use a physical address so
> > > we want to expose the virtio device without dma_ops in the hope that
> > > it will go for direct mapping where the physical address is used. But
> > > it may not work on some specific setups (arches that assume an IOMMU
> > > or have arch dma ops).
> >
> > The DMA device for virtio_pci is the underlying PCI device, always.
> > !VIRTIO_F_ACCESS_PLATFORM means there is no dma device at all.  Because
> > of all these things you can't just expose a pointer to the dma_device
> > as that is just a completely wrong way of thinking about the problem.
>
> Ok, so if there's no DMA at all we should avoid using the DMA API
> completely. This means we should check dma_dev against NULL in
> virtio_has_dma_quirk().
>
> Thanks
>
> >
>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-11  6:11                     ` Xuan Zhuo
@ 2023-04-11  6:20                       ` Christoph Hellwig
  2023-04-11  6:28                         ` Xuan Zhuo
  0 siblings, 1 reply; 57+ messages in thread
From: Christoph Hellwig @ 2023-04-11  6:20 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Christoph Hellwig, Michael S. Tsirkin, Guenter Roeck, virtualization

On Tue, Apr 11, 2023 at 02:11:17PM +0800, Xuan Zhuo wrote:
> NO, all dma maping is done inside xdp socket. That is done
> when setup.

That is just completely broken, virtio or not.  Highlevel code like
sockets must never do dma mappings themselves.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-11  3:26               ` Christoph Hellwig
@ 2023-04-11  6:23                 ` Xuan Zhuo
  2023-04-11  6:30                   ` Christoph Hellwig
  0 siblings, 1 reply; 57+ messages in thread
From: Xuan Zhuo @ 2023-04-11  6:23 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, virtualization, Guenter Roeck, Michael S. Tsirkin

On Mon, 10 Apr 2023 20:26:20 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> On Tue, Apr 11, 2023 at 09:56:19AM +0800, Xuan Zhuo wrote:
> > Do you mean we should not change the dma_ops from the outside of the core
> > kernel/dma/?
>
> Basically, yes.  That plus probing for the IOMMU drivers.
>
> > How about adding one new "dma_direct" to struct devive?
>
> Why would we?
>
> Can you please explain what you're even trying to do?

I want to force a device use direct dma map.

Thanks.

> But checking
> a dma_direct flag for sure isn't the solution.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-11  6:20                       ` Christoph Hellwig
@ 2023-04-11  6:28                         ` Xuan Zhuo
  2023-04-11  6:46                           ` Christoph Hellwig
  2023-04-12  2:03                           ` Xuan Zhuo
  0 siblings, 2 replies; 57+ messages in thread
From: Xuan Zhuo @ 2023-04-11  6:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Maciej Fijalkowski, Michael S. Tsirkin, virtualization,
	Christoph Hellwig, Björn Töpel, Guenter Roeck,
	Magnus Karlsson

On Mon, 10 Apr 2023 23:20:59 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> On Tue, Apr 11, 2023 at 02:11:17PM +0800, Xuan Zhuo wrote:
> > NO, all dma maping is done inside xdp socket. That is done
> > when setup.
>
> That is just completely broken, virtio or not.  Highlevel code like
> sockets must never do dma mappings themselves.

AF_XDP provides some API for net driver. This API will do dma map or dma sync.

cc AF_XDP maintainers.

If we cannot set Virtio Deivce to a direct DMA device, then we can only make
some modifications in AF_XDP.

Thanks.


> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-11  6:23                 ` Xuan Zhuo
@ 2023-04-11  6:30                   ` Christoph Hellwig
  2023-04-11  6:33                     ` Xuan Zhuo
  0 siblings, 1 reply; 57+ messages in thread
From: Christoph Hellwig @ 2023-04-11  6:30 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Christoph Hellwig, virtualization, Guenter Roeck, Michael S. Tsirkin

> > Can you please explain what you're even trying to do?
> 
> I want to force a device use direct dma map.

You fundamentally can't do that.  It can't work.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-11  6:30                   ` Christoph Hellwig
@ 2023-04-11  6:33                     ` Xuan Zhuo
  2023-04-11  6:45                       ` Christoph Hellwig
  0 siblings, 1 reply; 57+ messages in thread
From: Xuan Zhuo @ 2023-04-11  6:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, Michael S. Tsirkin, Guenter Roeck, virtualization

On Mon, 10 Apr 2023 23:30:29 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > > Can you please explain what you're even trying to do?
> >
> > I want to force a device use direct dma map.
>
> You fundamentally can't do that.  It can't work.

Do you mean my idea is wrong, right?

Can you explain it? Why can't we support a device that uses only dirct map?

Thanks.



> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-11  6:33                     ` Xuan Zhuo
@ 2023-04-11  6:45                       ` Christoph Hellwig
  2023-04-11  7:23                         ` Xuan Zhuo
  0 siblings, 1 reply; 57+ messages in thread
From: Christoph Hellwig @ 2023-04-11  6:45 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Christoph Hellwig, Michael S. Tsirkin, Guenter Roeck, virtualization

On Tue, Apr 11, 2023 at 02:33:29PM +0800, Xuan Zhuo wrote:
> Do you mean my idea is wrong, right?
> 
> Can you explain it? Why can't we support a device that uses only dirct map?

If a direct map or not is used is a decision done by the platform code,
often based on firmware tables.  You can't just override that.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-11  6:28                         ` Xuan Zhuo
@ 2023-04-11  6:46                           ` Christoph Hellwig
  2023-04-11  6:51                             ` Xuan Zhuo
  2023-04-12  2:03                           ` Xuan Zhuo
  1 sibling, 1 reply; 57+ messages in thread
From: Christoph Hellwig @ 2023-04-11  6:46 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Maciej Fijalkowski, Michael S. Tsirkin, virtualization,
	Christoph Hellwig, Björn Töpel, Guenter Roeck,
	Magnus Karlsson

On Tue, Apr 11, 2023 at 02:28:00PM +0800, Xuan Zhuo wrote:
> > That is just completely broken, virtio or not.  Highlevel code like
> > sockets must never do dma mappings themselves.
> 
> AF_XDP provides some API for net driver. This API will do dma map or dma sync.
> 
> cc AF_XDP maintainers.

So in that case AF_XDP doesn't actually require a dma device as you
claimed early and gets the layering right after all.  Just don't use that
API from a network driver that doesn't need to do dma mappings like
virtio for the !platform_access case then.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-11  6:46                           ` Christoph Hellwig
@ 2023-04-11  6:51                             ` Xuan Zhuo
  2023-04-11  7:04                               ` Xuan Zhuo
  0 siblings, 1 reply; 57+ messages in thread
From: Xuan Zhuo @ 2023-04-11  6:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Maciej Fijalkowski, Michael S. Tsirkin, virtualization,
	Christoph Hellwig, Björn Töpel, Guenter Roeck,
	Magnus Karlsson

On Mon, 10 Apr 2023 23:46:56 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> On Tue, Apr 11, 2023 at 02:28:00PM +0800, Xuan Zhuo wrote:
> > > That is just completely broken, virtio or not.  Highlevel code like
> > > sockets must never do dma mappings themselves.
> >
> > AF_XDP provides some API for net driver. This API will do dma map or dma sync.
> >
> > cc AF_XDP maintainers.
>
> So in that case AF_XDP doesn't actually require a dma device as you
> claimed early and gets the layering right after all.  Just don't use that
> API from a network driver that doesn't need to do dma mappings like
> virtio for the !platform_access case then.


Yes

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-11  6:51                             ` Xuan Zhuo
@ 2023-04-11  7:04                               ` Xuan Zhuo
  0 siblings, 0 replies; 57+ messages in thread
From: Xuan Zhuo @ 2023-04-11  7:04 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Maciej Fijalkowski, Michael S. Tsirkin, virtualization,
	Christoph Hellwig, Björn Töpel, Magnus Karlsson,
	Guenter Roeck

On Tue, 11 Apr 2023 14:51:29 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> On Mon, 10 Apr 2023 23:46:56 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > On Tue, Apr 11, 2023 at 02:28:00PM +0800, Xuan Zhuo wrote:
> > > > That is just completely broken, virtio or not.  Highlevel code like
> > > > sockets must never do dma mappings themselves.
> > >
> > > AF_XDP provides some API for net driver. This API will do dma map or dma sync.
> > >
> > > cc AF_XDP maintainers.
> >
> > So in that case AF_XDP doesn't actually require a dma device as you
> > claimed early and gets the layering right after all.  Just don't use that
> > API from a network driver that doesn't need to do dma mappings like
> > virtio for the !platform_access case then.
>
>
> Yes


For Virtio-Net support AF_XDP, if we cannot provide a direct DMA map device,
then we have two solutions now, but we need to cooperate with AF_XDP.

1. pass NULL as dev, then AF_XDP use phy memory as dma address, and skip sync
2. do dma/sync by callback from virtio-net


Thanks.



>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-10  5:14         ` Jason Wang
  2023-04-10  6:03           ` Xuan Zhuo
  2023-04-10 15:27           ` Christoph Hellwig
@ 2023-04-11  7:12           ` Xuan Zhuo
  2023-04-11  8:59             ` Jason Wang
  2023-04-11 12:17             ` Christoph Hellwig
  2 siblings, 2 replies; 57+ messages in thread
From: Xuan Zhuo @ 2023-04-11  7:12 UTC (permalink / raw)
  To: Jason Wang
  Cc: Christoph Hellwig, virtualization, Guenter Roeck, Michael S. Tsirkin


Hi Jason

Can we force virtio core to use dma api without VIRTIO_F_ACCESS_PLATFORM?


Thanks.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-11  6:45                       ` Christoph Hellwig
@ 2023-04-11  7:23                         ` Xuan Zhuo
  2023-04-11 12:16                           ` Christoph Hellwig
  0 siblings, 1 reply; 57+ messages in thread
From: Xuan Zhuo @ 2023-04-11  7:23 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, Michael S. Tsirkin, Guenter Roeck, virtualization

On Mon, 10 Apr 2023 23:45:33 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> On Tue, Apr 11, 2023 at 02:33:29PM +0800, Xuan Zhuo wrote:
> > Do you mean my idea is wrong, right?
> >
> > Can you explain it? Why can't we support a device that uses only dirct map?
>
> If a direct map or not is used is a decision done by the platform code,
> often based on firmware tables.  You can't just override that.


Can Virtio Device set its own dma_ops? It is a device on the virtual bus. It
sets its own DMA_OPS. I think it is reasonable.

Thanks.


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-11  4:54                       ` Jason Wang
  2023-04-11  5:14                         ` Christoph Hellwig
@ 2023-04-11  8:55                         ` Gerd Hoffmann
  1 sibling, 0 replies; 57+ messages in thread
From: Gerd Hoffmann @ 2023-04-11  8:55 UTC (permalink / raw)
  To: Jason Wang
  Cc: Christoph Hellwig, Xuan Zhuo, virtualization, Guenter Roeck,
	Michael S. Tsirkin

On Tue, Apr 11, 2023 at 12:54:46PM +0800, Jason Wang wrote:
> On Tue, Apr 11, 2023 at 12:10 PM Christoph Hellwig <hch@infradead.org> wrote:
> >
> > On Tue, Apr 11, 2023 at 11:56:47AM +0800, Jason Wang wrote:
> > > > The DMA device for virtio_pci is the underlying PCI device, always.
> > > > !VIRTIO_F_ACCESS_PLATFORM means there is no dma device at all.  Because
> > > > of all these things you can't just expose a pointer to the dma_device
> > > > as that is just a completely wrong way of thinking about the problem.
> > >
> > > Ok, so if there's no DMA at all we should avoid using the DMA API
> > > completely. This means we should check dma_dev against NULL in
> > > virtio_has_dma_quirk().
> >
> > No nee to add a check to virtio_has_dma_quirk.
> 
> Ok, just to clarify, I meant there could be a case where the virtqueue
> is emulated by software, in this case we need check whether the
> virtqueue has a dma device or not in vring_use_dma_api(). If not we
> need return false.
> 
> > But looking at virtio_has_dma_quirk shows that virtio-gpu is
> > pretty messed up already as well.

Yes.  For gem objects allocated in guest ram virtio-gpu effectively
passes a scatter list to the host.  It doesn't use vrings for that
though, so it has to re-implement some of the logic the virtio core
already has for the vrings.

> > It can't really poke into the DMA details.  We'll need core virtio
> > helpers for allocating and syncing a sg_table instead and make
> > virtio_has_dma_quirk private to the core.

That should work.

take care,
  Gerd

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-11  7:12           ` Xuan Zhuo
@ 2023-04-11  8:59             ` Jason Wang
  2023-04-11 12:17             ` Christoph Hellwig
  1 sibling, 0 replies; 57+ messages in thread
From: Jason Wang @ 2023-04-11  8:59 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Christoph Hellwig, virtualization, Guenter Roeck, Michael S. Tsirkin

On Tue, Apr 11, 2023 at 3:21 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
>
> Hi Jason
>
> Can we force virtio core to use dma api without VIRTIO_F_ACCESS_PLATFORM?
>
>

I don't think so. Without ACCESS_PLATFORM it's not DMA, so DMA API
should be avoided. And we have several software devices for virtio
now.

I'd suggest involving AF_XDP maintainers to join the discussion to
seek a solution. I tend to agree with Christoph that everything will
be simplified if DMA is done at the driver level.

Thanks

> Thanks.
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-11  7:23                         ` Xuan Zhuo
@ 2023-04-11 12:16                           ` Christoph Hellwig
  2023-04-18  6:18                             ` Xuan Zhuo
  0 siblings, 1 reply; 57+ messages in thread
From: Christoph Hellwig @ 2023-04-11 12:16 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Christoph Hellwig, Michael S. Tsirkin, Guenter Roeck, virtualization

On Tue, Apr 11, 2023 at 03:23:43PM +0800, Xuan Zhuo wrote:
> > If a direct map or not is used is a decision done by the platform code,
> > often based on firmware tables.  You can't just override that.
> 
> 
> Can Virtio Device set its own dma_ops? It is a device on the virtual bus. It
> sets its own DMA_OPS. I think it is reasonable.

No, it can't.  virtio devices are backed by PCI, platform or other
bus devices, and the (often virtual) firmware controls how DMA mapping
is to be performed for them, at least for the platform_access case.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-11  7:12           ` Xuan Zhuo
  2023-04-11  8:59             ` Jason Wang
@ 2023-04-11 12:17             ` Christoph Hellwig
  1 sibling, 0 replies; 57+ messages in thread
From: Christoph Hellwig @ 2023-04-11 12:17 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Christoph Hellwig, virtualization, Guenter Roeck, Michael S. Tsirkin

On Tue, Apr 11, 2023 at 03:12:13PM +0800, Xuan Zhuo wrote:
> 
> Hi Jason
> 
> Can we force virtio core to use dma api without VIRTIO_F_ACCESS_PLATFORM?

So in your last mail you asked for it to be the other way around?
But, no you can't do this either unless you control the platform as
in the Xen PV case.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-11  6:28                         ` Xuan Zhuo
  2023-04-11  6:46                           ` Christoph Hellwig
@ 2023-04-12  2:03                           ` Xuan Zhuo
  2023-04-12 11:54                             ` Christoph Hellwig
  1 sibling, 1 reply; 57+ messages in thread
From: Xuan Zhuo @ 2023-04-12  2:03 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Maciej Fijalkowski, Michael S. Tsirkin, virtualization,
	Christoph Hellwig, Björn Töpel, Magnus Karlsson,
	Guenter Roeck

Here, this has cc the maintainers of AF_XDP.

For on the same page, I summarize it.

We discusses this question last at [1]. We planned to pass one device to xsk.
Then xsk can use the DMA API on this device. The device can be one hw
device(such as PCI, mmio) or virtio device. If it is one hw device, no problem.
If it is one virtio device, then we expect that DMA API on that will return
direct dma map, because the dma_ops of the virtio device is NULL. But on some
platform such as sparc64 or some case of x86, the arch has own dma_ops. So we
wrong.

And as the discuss of this thread, we cannot get direct dma address from
virtio device by DMA API. So, we need xsk to expose the DMA ops to the
virtio-net driver when virtio core can not use the DMA API.

All the processing of dma inside xsk:
1. map/unmap at enable/disable
2. dma sync

Solution:
1. xsk uses phy memory and passes the dma sync when the dev is null.
2. xsk uses the callbacks passed by driver to do the dma.

If I miss something, please point out.

Thanks.

[1] https://lore.kernel.org/virtualization/1677727282.6062915-2-xuanzhuo@linux.alibaba.com/
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-12  2:03                           ` Xuan Zhuo
@ 2023-04-12 11:54                             ` Christoph Hellwig
  0 siblings, 0 replies; 57+ messages in thread
From: Christoph Hellwig @ 2023-04-12 11:54 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Maciej Fijalkowski, Michael S. Tsirkin, virtualization,
	Christoph Hellwig, Björn Töpel, Magnus Karlsson,
	Guenter Roeck

On Wed, Apr 12, 2023 at 10:03:46AM +0800, Xuan Zhuo wrote:
> We discusses this question last at [1]. We planned to pass one device to xsk.
> Then xsk can use the DMA API on this device. The device can be one hw
> device(such as PCI, mmio) or virtio device. If it is one hw device, no problem.

What do you mean with one here?  A virtio device should never be baken
by more than one hardware device.

> If it is one virtio device, then we expect that DMA API on that will return
> direct dma map, because the dma_ops of the virtio device is NULL. But on some
> platform such as sparc64 or some case of x86, the arch has own dma_ops. So we
> wrong.

No, you can't expect a device to use any particular dma ops.  Most
platforms support either the direct mapping or some form of IOMMU.

But for virtio thinks are even more complicated - unless
VIRTIO_F_ACCESS_PLATFORM is set (which really must be set for all
real hardware devices for them to work), the DMA API isn't even used
at all.  That means the virtual platform (i.e.g qemu) does DMA based
on physical addresses and virtio ignores all the plaform DMA setup.

> And as the discuss of this thread, we cannot get direct dma address from
> virtio device by DMA API. So, we need xsk to expose the DMA ops to the
> virtio-net driver when virtio core can not use the DMA API.
> 
> All the processing of dma inside xsk:
> 1. map/unmap at enable/disable
> 2. dma sync

For the VIRTIO_F_ACCESS_PLATFORM you can just use the DMA API like
any other device, but this really should be done inside virtio instead
of an upper layer.

For the !VIRTIO_F_ACCESS_PLATFORM case there is no need to sync,
and the dma mapping is a simple virt_to_phys or page_to_phys, so
there's no real need to premap to start with.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-11 12:16                           ` Christoph Hellwig
@ 2023-04-18  6:18                             ` Xuan Zhuo
  2023-04-19  5:10                               ` Christoph Hellwig
  0 siblings, 1 reply; 57+ messages in thread
From: Xuan Zhuo @ 2023-04-18  6:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, Michael S. Tsirkin, Guenter Roeck, virtualization

On Tue, 11 Apr 2023 05:16:19 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> On Tue, Apr 11, 2023 at 03:23:43PM +0800, Xuan Zhuo wrote:
> > > If a direct map or not is used is a decision done by the platform code,
> > > often based on firmware tables.  You can't just override that.
> >
> >
> > Can Virtio Device set its own dma_ops? It is a device on the virtual bus. It
> > sets its own DMA_OPS. I think it is reasonable.
>
> No, it can't.  virtio devices are backed by PCI, platform or other
> bus devices, and the (often virtual) firmware controls how DMA mapping
> is to be performed for them, at least for the platform_access case.


Sorry, rethink about this, I think we maybe misunderstand something.

First of all, let me give you a brief introduce of virtio device and pci device.
If I make mistake, please point out.


First, when one virtio pci device is probed, then the virtio pci driver will be
called. Then we got one pci_device.

Then virtio_pci_probe will alloc one new device, and register it to virtio bus
by register_virtio_device().


So here we have two device: pci-device and virtio-device.

If we call DMA API inside virtio, we use the pci-device. The virtio-device is
not used for DMA API.

Now we want to use the virtio-device to do direct dma. The virtio-device
is created by virtio_pci_probe() of virtio pci driver. And register to virtio
bus. So no firmware and not iommu and the bus is virtio bus, why we can not
change the dma_ops of virtio-device?


Thanks.





































_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-18  6:18                             ` Xuan Zhuo
@ 2023-04-19  5:10                               ` Christoph Hellwig
  2023-04-19  6:16                                 ` Xuan Zhuo
  0 siblings, 1 reply; 57+ messages in thread
From: Christoph Hellwig @ 2023-04-19  5:10 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Christoph Hellwig, Michael S. Tsirkin, Guenter Roeck, virtualization

On Tue, Apr 18, 2023 at 02:18:52PM +0800, Xuan Zhuo wrote:
> Sorry, rethink about this, I think we maybe misunderstand something.
> 
> First of all, let me give you a brief introduce of virtio device and pci device.
> If I make mistake, please point out.
> 
> 
> First, when one virtio pci device is probed, then the virtio pci driver will be
> called. Then we got one pci_device.

Yes.

> Then virtio_pci_probe will alloc one new device, and register it to virtio bus
> by register_virtio_device().
> 
> 
> So here we have two device: pci-device and virtio-device.

Yes.

> If we call DMA API inside virtio, we use the pci-device. The virtio-device is
> not used for DMA API.

Exactly.

> Now we want to use the virtio-device to do direct dma. The virtio-device
> is created by virtio_pci_probe() of virtio pci driver. And register to virtio
> bus. So no firmware and not iommu and the bus is virtio bus, why we can not
> change the dma_ops of virtio-device?

Because firmware doesn't know about your virtio-device.  It is just a
made up Linux concept, and the IOMMU and firmware tables for it don't
know about it.  DMA must only ever be done on actual physical
(including "physical" devices emulated by a hypervisor) devices, not
on devices made up by Linux.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev()
  2023-04-19  5:10                               ` Christoph Hellwig
@ 2023-04-19  6:16                                 ` Xuan Zhuo
  0 siblings, 0 replies; 57+ messages in thread
From: Xuan Zhuo @ 2023-04-19  6:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, Michael S. Tsirkin, Guenter Roeck, virtualization

On Tue, 18 Apr 2023 22:10:03 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> On Tue, Apr 18, 2023 at 02:18:52PM +0800, Xuan Zhuo wrote:
> > Sorry, rethink about this, I think we maybe misunderstand something.
> >
> > First of all, let me give you a brief introduce of virtio device and pci device.
> > If I make mistake, please point out.
> >
> >
> > First, when one virtio pci device is probed, then the virtio pci driver will be
> > called. Then we got one pci_device.
>
> Yes.
>
> > Then virtio_pci_probe will alloc one new device, and register it to virtio bus
> > by register_virtio_device().
> >
> >
> > So here we have two device: pci-device and virtio-device.
>
> Yes.
>
> > If we call DMA API inside virtio, we use the pci-device. The virtio-device is
> > not used for DMA API.
>
> Exactly.
>
> > Now we want to use the virtio-device to do direct dma. The virtio-device
> > is created by virtio_pci_probe() of virtio pci driver. And register to virtio
> > bus. So no firmware and not iommu and the bus is virtio bus, why we can not
> > change the dma_ops of virtio-device?
>
> Because firmware doesn't know about your virtio-device.  It is just a
> made up Linux concept, and the IOMMU and firmware tables for it don't
> know about it.  DMA must only ever be done on actual physical
> (including "physical" devices emulated by a hypervisor) devices, not
> on devices made up by Linux.


It's clear for me.

Thanks.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2023-04-19  6:19 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-27  4:05 [PATCH vhost v6 00/11] virtio core prepares for AF_XDP Xuan Zhuo
2023-03-27  4:05 ` [PATCH vhost v6 01/11] virtio_ring: split: separate dma codes Xuan Zhuo
2023-03-28  6:26   ` Jason Wang
2023-03-27  4:05 ` [PATCH vhost v6 02/11] virtio_ring: packed: " Xuan Zhuo
2023-03-27  4:05 ` [PATCH vhost v6 03/11] virtio_ring: packed-indirect: " Xuan Zhuo
2023-03-27  4:05 ` [PATCH vhost v6 04/11] virtio_ring: split: support premapped Xuan Zhuo
2023-03-28  6:27   ` Jason Wang
2023-03-27  4:05 ` [PATCH vhost v6 05/11] virtio_ring: packed: " Xuan Zhuo
2023-03-28  6:27   ` Jason Wang
2023-03-27  4:05 ` [PATCH vhost v6 06/11] virtio_ring: packed-indirect: " Xuan Zhuo
2023-03-28  6:29   ` Jason Wang
2023-03-27  4:05 ` [PATCH vhost v6 07/11] virtio_ring: update document for virtqueue_add_* Xuan Zhuo
2023-03-28  6:30   ` Jason Wang
2023-03-27  4:05 ` [PATCH vhost v6 08/11] virtio_ring: introduce virtqueue_dma_dev() Xuan Zhuo
2023-04-06 17:20   ` Guenter Roeck
2023-04-07  3:17     ` Xuan Zhuo
2023-04-07 12:34       ` Guenter Roeck
2023-04-07 11:02     ` Michael S. Tsirkin
2023-04-10  3:29       ` Xuan Zhuo
2023-04-10  5:14         ` Jason Wang
2023-04-10  6:03           ` Xuan Zhuo
2023-04-10  6:40             ` Michael S. Tsirkin
2023-04-10  6:42               ` Xuan Zhuo
2023-04-10 15:27           ` Christoph Hellwig
2023-04-11  1:56             ` Xuan Zhuo
2023-04-11  2:09               ` Jason Wang
2023-04-11  3:31                 ` Christoph Hellwig
2023-04-11  3:56                   ` Jason Wang
2023-04-11  4:09                     ` Christoph Hellwig
2023-04-11  4:54                       ` Jason Wang
2023-04-11  5:14                         ` Christoph Hellwig
2023-04-11  5:19                           ` Jason Wang
2023-04-11  8:55                         ` Gerd Hoffmann
2023-04-11  6:11                     ` Xuan Zhuo
2023-04-11  6:20                       ` Christoph Hellwig
2023-04-11  6:28                         ` Xuan Zhuo
2023-04-11  6:46                           ` Christoph Hellwig
2023-04-11  6:51                             ` Xuan Zhuo
2023-04-11  7:04                               ` Xuan Zhuo
2023-04-12  2:03                           ` Xuan Zhuo
2023-04-12 11:54                             ` Christoph Hellwig
2023-04-11  3:26               ` Christoph Hellwig
2023-04-11  6:23                 ` Xuan Zhuo
2023-04-11  6:30                   ` Christoph Hellwig
2023-04-11  6:33                     ` Xuan Zhuo
2023-04-11  6:45                       ` Christoph Hellwig
2023-04-11  7:23                         ` Xuan Zhuo
2023-04-11 12:16                           ` Christoph Hellwig
2023-04-18  6:18                             ` Xuan Zhuo
2023-04-19  5:10                               ` Christoph Hellwig
2023-04-19  6:16                                 ` Xuan Zhuo
2023-04-11  7:12           ` Xuan Zhuo
2023-04-11  8:59             ` Jason Wang
2023-04-11 12:17             ` Christoph Hellwig
2023-03-27  4:05 ` [PATCH vhost v6 09/11] virtio_ring: correct the expression of the description of virtqueue_resize() Xuan Zhuo
2023-03-27  4:05 ` [PATCH vhost v6 10/11] virtio_ring: separate the logic of reset/enable from virtqueue_resize Xuan Zhuo
2023-03-27  4:05 ` [PATCH vhost v6 11/11] virtio_ring: introduce virtqueue_reset() Xuan Zhuo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).