[PATCH vhost v2 0/7] virtio_net: rx enable premapped mode by default

virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed

* [PATCH vhost v2 0/7] virtio_net: rx enable premapped mode by default
@ 2024-04-22  7:24 Xuan Zhuo
  2024-04-22  7:24 ` [PATCH vhost v2 1/7] virtio_ring: introduce dma map api for page Xuan Zhuo
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Xuan Zhuo @ 2024-04-22  7:24 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

Actually, for the virtio drivers, we can enable premapped mode whatever
the value of use_dma_api. Because we provide the virtio dma apis.
So the driver can enable premapped mode unconditionally.

This patch set makes the big mode of virtio-net to support premapped mode.
And enable premapped mode for rx by default.

Based on the following points, we do not use page pool to manage these
    pages:

    1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
       we can only prevent the page pool from performing DMA operations, and
       let the driver perform DMA operations on the allocated pages.
    2. But when the page pool releases the page, we have no chance to
       execute dma unmap.
    3. A solution to #2 is to execute dma unmap every time before putting
       the page back to the page pool. (This is actually a waste, we don't
       execute unmap so frequently.)
    4. But there is another problem, we still need to use page.dma_addr to
       save the dma address. Using page.dma_addr while using page pool is
       unsafe behavior.

    More:
        https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/

Please review.

v2:
    1. make gcc happy in page_chain_get_dma()
        http://lore.kernel.org/all/202404221325.SX5ChRGP-lkp@intel.com

v1:
    1. discussed for using page pool
    2. use dma sync to replace the unmap for the first page

Thanks.



Xuan Zhuo (7):
  virtio_ring: introduce dma map api for page
  virtio_ring: enable premapped mode whatever use_dma_api
  virtio_net: replace private by pp struct inside page
  virtio_net: big mode support premapped
  virtio_net: enable premapped by default
  virtio_net: rx remove premapped failover code
  virtio_net: remove the misleading comment

 drivers/net/virtio_net.c     | 243 +++++++++++++++++++++++------------
 drivers/virtio/virtio_ring.c |  59 ++++++++-
 include/linux/virtio.h       |   7 +
 3 files changed, 222 insertions(+), 87 deletions(-)

--
2.32.0.3.g01195cf9f


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH vhost v2 1/7] virtio_ring: introduce dma map api for page
  2024-04-22  7:24 [PATCH vhost v2 0/7] virtio_net: rx enable premapped mode by default Xuan Zhuo
@ 2024-04-22  7:24 ` Xuan Zhuo
  2024-04-22  7:24 ` [PATCH vhost v2 2/7] virtio_ring: enable premapped mode whatever use_dma_api Xuan Zhuo
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Xuan Zhuo @ 2024-04-22  7:24 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

The virtio-net big mode sq will use these APIs to map the pages.

dma_addr_t virtqueue_dma_map_page_attrs(struct virtqueue *_vq, struct page *page,
                                       size_t offset, size_t size,
                                       enum dma_data_direction dir,
                                       unsigned long attrs);
void virtqueue_dma_unmap_page_attrs(struct virtqueue *_vq, dma_addr_t addr,
                                   size_t size, enum dma_data_direction dir,
                                   unsigned long attrs);

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 52 ++++++++++++++++++++++++++++++++++++
 include/linux/virtio.h       |  7 +++++
 2 files changed, 59 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 70de1a9a81a3..1b9fb680cff3 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -3100,6 +3100,58 @@ void virtqueue_dma_unmap_single_attrs(struct virtqueue *_vq, dma_addr_t addr,
 }
 EXPORT_SYMBOL_GPL(virtqueue_dma_unmap_single_attrs);
 
+/**
+ * virtqueue_dma_map_page_attrs - map DMA for _vq
+ * @_vq: the struct virtqueue we're talking about.
+ * @page: the page to do dma
+ * @offset: the offset inside the page
+ * @size: the size of the page to do dma
+ * @dir: DMA direction
+ * @attrs: DMA Attrs
+ *
+ * The caller calls this to do dma mapping in advance. The DMA address can be
+ * passed to this _vq when it is in pre-mapped mode.
+ *
+ * return DMA address. Caller should check that by virtqueue_dma_mapping_error().
+ */
+dma_addr_t virtqueue_dma_map_page_attrs(struct virtqueue *_vq, struct page *page,
+					size_t offset, size_t size,
+					enum dma_data_direction dir,
+					unsigned long attrs)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	if (!vq->use_dma_api)
+		return page_to_phys(page) + offset;
+
+	return dma_map_page_attrs(vring_dma_dev(vq), page, offset, size, dir, attrs);
+}
+EXPORT_SYMBOL_GPL(virtqueue_dma_map_page_attrs);
+
+/**
+ * virtqueue_dma_unmap_page_attrs - unmap DMA for _vq
+ * @_vq: the struct virtqueue we're talking about.
+ * @addr: the dma address to unmap
+ * @size: the size of the buffer
+ * @dir: DMA direction
+ * @attrs: DMA Attrs
+ *
+ * Unmap the address that is mapped by the virtqueue_dma_map_* APIs.
+ *
+ */
+void virtqueue_dma_unmap_page_attrs(struct virtqueue *_vq, dma_addr_t addr,
+				    size_t size, enum dma_data_direction dir,
+				    unsigned long attrs)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	if (!vq->use_dma_api)
+		return;
+
+	dma_unmap_page_attrs(vring_dma_dev(vq), addr, size, dir, attrs);
+}
+EXPORT_SYMBOL_GPL(virtqueue_dma_unmap_page_attrs);
+
 /**
  * virtqueue_dma_mapping_error - check dma address
  * @_vq: the struct virtqueue we're talking about.
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 26c4325aa373..d6c699553979 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -228,6 +228,13 @@ dma_addr_t virtqueue_dma_map_single_attrs(struct virtqueue *_vq, void *ptr, size
 void virtqueue_dma_unmap_single_attrs(struct virtqueue *_vq, dma_addr_t addr,
 				      size_t size, enum dma_data_direction dir,
 				      unsigned long attrs);
+dma_addr_t virtqueue_dma_map_page_attrs(struct virtqueue *_vq, struct page *page,
+					size_t offset, size_t size,
+					enum dma_data_direction dir,
+					unsigned long attrs);
+void virtqueue_dma_unmap_page_attrs(struct virtqueue *_vq, dma_addr_t addr,
+				    size_t size, enum dma_data_direction dir,
+				    unsigned long attrs);
 int virtqueue_dma_mapping_error(struct virtqueue *_vq, dma_addr_t addr);
 
 bool virtqueue_dma_need_sync(struct virtqueue *_vq, dma_addr_t addr);
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH vhost v2 2/7] virtio_ring: enable premapped mode whatever use_dma_api
  2024-04-22  7:24 [PATCH vhost v2 0/7] virtio_net: rx enable premapped mode by default Xuan Zhuo
  2024-04-22  7:24 ` [PATCH vhost v2 1/7] virtio_ring: introduce dma map api for page Xuan Zhuo
@ 2024-04-22  7:24 ` Xuan Zhuo
  2024-04-22  7:24 ` [PATCH vhost v2 3/7] virtio_net: replace private by pp struct inside page Xuan Zhuo
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Xuan Zhuo @ 2024-04-22  7:24 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

Now, we have virtio DMA APIs, the driver can be the premapped
mode whatever the virtio core uses dma api or not.

So remove the limit of checking use_dma_api from
virtqueue_set_dma_premapped().

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 1b9fb680cff3..1924402e0d84 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2730,7 +2730,7 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
  *
  * Returns zero or a negative error.
  * 0: success.
- * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
+ * -EINVAL: the vq is in use.
  */
 int virtqueue_set_dma_premapped(struct virtqueue *_vq)
 {
@@ -2746,11 +2746,6 @@ int virtqueue_set_dma_premapped(struct virtqueue *_vq)
 		return -EINVAL;
 	}
 
-	if (!vq->use_dma_api) {
-		END_USE(vq);
-		return -EINVAL;
-	}
-
 	vq->premapped = true;
 	vq->do_unmap = false;
 
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH vhost v2 3/7] virtio_net: replace private by pp struct inside page
  2024-04-22  7:24 [PATCH vhost v2 0/7] virtio_net: rx enable premapped mode by default Xuan Zhuo
  2024-04-22  7:24 ` [PATCH vhost v2 1/7] virtio_ring: introduce dma map api for page Xuan Zhuo
  2024-04-22  7:24 ` [PATCH vhost v2 2/7] virtio_ring: enable premapped mode whatever use_dma_api Xuan Zhuo
@ 2024-04-22  7:24 ` Xuan Zhuo
  2024-04-22  7:24 ` [PATCH vhost v2 4/7] virtio_net: big mode support premapped Xuan Zhuo
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Xuan Zhuo @ 2024-04-22  7:24 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

Now, we chain the pages of big mode by the page's private variable.
But a subsequent patch aims to make the big mode to support
premapped mode. This requires additional space to store the dma addr.

Within the sub-struct that contains the 'private', there is no suitable
variable for storing the DMA addr.

		struct {	/* Page cache and anonymous pages */
			/**
			 * @lru: Pageout list, eg. active_list protected by
			 * lruvec->lru_lock.  Sometimes used as a generic list
			 * by the page owner.
			 */
			union {
				struct list_head lru;

				/* Or, for the Unevictable "LRU list" slot */
				struct {
					/* Always even, to negate PageTail */
					void *__filler;
					/* Count page's or folio's mlocks */
					unsigned int mlock_count;
				};

				/* Or, free page */
				struct list_head buddy_list;
				struct list_head pcp_list;
			};
			/* See page-flags.h for PAGE_MAPPING_FLAGS */
			struct address_space *mapping;
			union {
				pgoff_t index;		/* Our offset within mapping. */
				unsigned long share;	/* share count for fsdax */
			};
			/**
			 * @private: Mapping-private opaque data.
			 * Usually used for buffer_heads if PagePrivate.
			 * Used for swp_entry_t if PageSwapCache.
			 * Indicates order in the buddy system if PageBuddy.
			 */
			unsigned long private;
		};

But within the page pool struct, we have a variable called
dma_addr that is appropriate for storing dma addr.
And that struct is used by netstack. That works to our advantage.

		struct {	/* page_pool used by netstack */
			/**
			 * @pp_magic: magic value to avoid recycling non
			 * page_pool allocated pages.
			 */
			unsigned long pp_magic;
			struct page_pool *pp;
			unsigned long _pp_mapping_pad;
			unsigned long dma_addr;
			atomic_long_t pp_ref_count;
		};

On the other side, we should use variables from the same sub-struct.
So this patch replaces the "private" with "pp".

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 32 ++++++++++++++++++++------------
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c22d1118a133..2c7a67ad4789 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -48,6 +48,14 @@ module_param(napi_tx, bool, 0644);
 
 #define VIRTIO_XDP_FLAG	BIT(0)
 
+/* In big mode, we use a page chain to manage multiple pages submitted to the
+ * ring. These pages are connected using page.pp. The following two macros are
+ * used to obtain the next page in a page chain and set the next page in the
+ * page chain.
+ */
+#define page_chain_next(p)	((struct page *)((p)->pp))
+#define page_chain_add(p, n)	((p)->pp = (void *)n)
+
 /* RX packet size EWMA. The average packet size is used to determine the packet
  * buffer size when refilling RX rings. As the entire RX ring may be refilled
  * at once, the weight is chosen so that the EWMA will be insensitive to short-
@@ -191,7 +199,7 @@ struct receive_queue {
 
 	struct virtnet_interrupt_coalesce intr_coal;
 
-	/* Chain pages by the private ptr. */
+	/* Chain pages by the page's pp struct. */
 	struct page *pages;
 
 	/* Average packet length for mergeable receive buffers. */
@@ -432,16 +440,16 @@ skb_vnet_common_hdr(struct sk_buff *skb)
 }
 
 /*
- * private is used to chain pages for big packets, put the whole
- * most recent used list in the beginning for reuse
+ * put the whole most recent used list in the beginning for reuse
  */
 static void give_pages(struct receive_queue *rq, struct page *page)
 {
 	struct page *end;
 
 	/* Find end of list, sew whole thing into vi->rq.pages. */
-	for (end = page; end->private; end = (struct page *)end->private);
-	end->private = (unsigned long)rq->pages;
+	for (end = page; page_chain_next(end); end = page_chain_next(end));
+
+	page_chain_add(end, rq->pages);
 	rq->pages = page;
 }
 
@@ -450,9 +458,9 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
 	struct page *p = rq->pages;
 
 	if (p) {
-		rq->pages = (struct page *)p->private;
-		/* clear private here, it is used to chain pages */
-		p->private = 0;
+		rq->pages = page_chain_next(p);
+		/* clear chain here, it is used to chain pages */
+		page_chain_add(p, NULL);
 	} else
 		p = alloc_page(gfp_mask);
 	return p;
@@ -609,7 +617,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 		if (unlikely(!skb))
 			return NULL;
 
-		page = (struct page *)page->private;
+		page = page_chain_next(page);
 		if (page)
 			give_pages(rq, page);
 		goto ok;
@@ -657,7 +665,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 		skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page, offset,
 				frag_size, truesize);
 		len -= frag_size;
-		page = (struct page *)page->private;
+		page = page_chain_next(page);
 		offset = 0;
 	}
 
@@ -1909,7 +1917,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
 		sg_set_buf(&rq->sg[i], page_address(first), PAGE_SIZE);
 
 		/* chain new page in list head to match sg */
-		first->private = (unsigned long)list;
+		page_chain_add(first, list);
 		list = first;
 	}
 
@@ -1929,7 +1937,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
 	sg_set_buf(&rq->sg[1], p + offset, PAGE_SIZE - offset);
 
 	/* chain first in list head */
-	first->private = (unsigned long)list;
+	page_chain_add(first, list);
 	err = virtqueue_add_inbuf(rq->vq, rq->sg, vi->big_packets_num_skbfrags + 2,
 				  first, gfp);
 	if (err < 0)
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH vhost v2 4/7] virtio_net: big mode support premapped
  2024-04-22  7:24 [PATCH vhost v2 0/7] virtio_net: rx enable premapped mode by default Xuan Zhuo
                   ` (2 preceding siblings ...)
  2024-04-22  7:24 ` [PATCH vhost v2 3/7] virtio_net: replace private by pp struct inside page Xuan Zhuo
@ 2024-04-22  7:24 ` Xuan Zhuo
  2024-04-23  4:36   ` Jason Wang
  2024-04-22  7:24 ` [PATCH vhost v2 5/7] virtio_net: enable premapped by default Xuan Zhuo
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Xuan Zhuo @ 2024-04-22  7:24 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

In big mode, pre-mapping DMA is beneficial because if the pages are not
used, we can reuse them without needing to unmap and remap.

We require space to store the DMA address. I use the page.dma_addr to
store the DMA address from the pp structure inside the page.

Every page retrieved from get_a_page() is mapped, and its DMA address is
stored in page.dma_addr. When a page is returned to the chain, we check
the DMA status; if it is not mapped (potentially having been unmapped),
we remap it before returning it to the chain.

Based on the following points, we do not use page pool to manage these
pages:

1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
   we can only prevent the page pool from performing DMA operations, and
   let the driver perform DMA operations on the allocated pages.
2. But when the page pool releases the page, we have no chance to
   execute dma unmap.
3. A solution to #2 is to execute dma unmap every time before putting
   the page back to the page pool. (This is actually a waste, we don't
   execute unmap so frequently.)
4. But there is another problem, we still need to use page.dma_addr to
   save the dma address. Using page.dma_addr while using page pool is
   unsafe behavior.

More:
    https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
 1 file changed, 108 insertions(+), 15 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 2c7a67ad4789..d4f5e65b247e 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
 	return (struct virtio_net_common_hdr *)skb->cb;
 }
 
+static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
+{
+	sg->dma_address = addr;
+	sg->length = len;
+}
+
+/* For pages submitted to the ring, we need to record its dma for unmap.
+ * Here, we use the page.dma_addr and page.pp_magic to store the dma
+ * address.
+ */
+static void page_chain_set_dma(struct page *p, dma_addr_t addr)
+{
+	if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
+		p->dma_addr = lower_32_bits(addr);
+		p->pp_magic = upper_32_bits(addr);
+	} else {
+		p->dma_addr = addr;
+	}
+}
+
+static dma_addr_t page_chain_get_dma(struct page *p)
+{
+	if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
+		u64 addr;
+
+		addr = p->pp_magic;
+		return (addr << 32) + p->dma_addr;
+	} else {
+		return p->dma_addr;
+	}
+}
+
+static void page_chain_sync_for_cpu(struct receive_queue *rq, struct page *p)
+{
+	virtqueue_dma_sync_single_range_for_cpu(rq->vq, page_chain_get_dma(p),
+						0, PAGE_SIZE, DMA_FROM_DEVICE);
+}
+
+static void page_chain_unmap(struct receive_queue *rq, struct page *p, bool sync)
+{
+	int attr = 0;
+
+	if (!sync)
+		attr = DMA_ATTR_SKIP_CPU_SYNC;
+
+	virtqueue_dma_unmap_page_attrs(rq->vq, page_chain_get_dma(p), PAGE_SIZE,
+				       DMA_FROM_DEVICE, attr);
+}
+
+static int page_chain_map(struct receive_queue *rq, struct page *p)
+{
+	dma_addr_t addr;
+
+	addr = virtqueue_dma_map_page_attrs(rq->vq, p, 0, PAGE_SIZE, DMA_FROM_DEVICE, 0);
+	if (virtqueue_dma_mapping_error(rq->vq, addr))
+		return -ENOMEM;
+
+	page_chain_set_dma(p, addr);
+	return 0;
+}
+
+static void page_chain_release(struct receive_queue *rq)
+{
+	struct page *p, *n;
+
+	for (p = rq->pages; p; p = n) {
+		n = page_chain_next(p);
+
+		page_chain_unmap(rq, p, true);
+		__free_pages(p, 0);
+	}
+
+	rq->pages = NULL;
+}
+
 /*
  * put the whole most recent used list in the beginning for reuse
  */
@@ -461,8 +536,15 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
 		rq->pages = page_chain_next(p);
 		/* clear chain here, it is used to chain pages */
 		page_chain_add(p, NULL);
-	} else
+	} else {
 		p = alloc_page(gfp_mask);
+
+		if (page_chain_map(rq, p)) {
+			__free_pages(p, 0);
+			return NULL;
+		}
+	}
+
 	return p;
 }
 
@@ -617,9 +699,12 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 		if (unlikely(!skb))
 			return NULL;
 
-		page = page_chain_next(page);
-		if (page)
-			give_pages(rq, page);
+		if (!vi->mergeable_rx_bufs) {
+			page_chain_unmap(rq, page, false);
+			page = page_chain_next(page);
+			if (page)
+				give_pages(rq, page);
+		}
 		goto ok;
 	}
 
@@ -662,6 +747,9 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 	BUG_ON(offset >= PAGE_SIZE);
 	while (len) {
 		unsigned int frag_size = min((unsigned)PAGE_SIZE - offset, len);
+
+		page_chain_unmap(rq, page, !offset);
+
 		skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page, offset,
 				frag_size, truesize);
 		len -= frag_size;
@@ -828,7 +916,8 @@ static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
 
 	rq = &vi->rq[i];
 
-	if (rq->do_dma)
+	/* Skip the unmap for big mode. */
+	if (!vi->big_packets || vi->mergeable_rx_bufs)
 		virtnet_rq_unmap(rq, buf, 0);
 
 	virtnet_rq_free_buf(vi, rq, buf);
@@ -1351,8 +1440,12 @@ static struct sk_buff *receive_big(struct net_device *dev,
 				   struct virtnet_rq_stats *stats)
 {
 	struct page *page = buf;
-	struct sk_buff *skb =
-		page_to_skb(vi, rq, page, 0, len, PAGE_SIZE, 0);
+	struct sk_buff *skb;
+
+	/* sync first page. The follow code may read this page. */
+	page_chain_sync_for_cpu(rq, page);
+
+	skb = page_to_skb(vi, rq, page, 0, len, PAGE_SIZE, 0);
 
 	u64_stats_add(&stats->bytes, len - vi->hdr_len);
 	if (unlikely(!skb))
@@ -1901,7 +1994,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
 			   gfp_t gfp)
 {
 	struct page *first, *list = NULL;
-	char *p;
+	dma_addr_t p;
 	int i, err, offset;
 
 	sg_init_table(rq->sg, vi->big_packets_num_skbfrags + 2);
@@ -1914,7 +2007,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
 				give_pages(rq, list);
 			return -ENOMEM;
 		}
-		sg_set_buf(&rq->sg[i], page_address(first), PAGE_SIZE);
+		sg_fill_dma(&rq->sg[i], page_chain_get_dma(first), PAGE_SIZE);
 
 		/* chain new page in list head to match sg */
 		page_chain_add(first, list);
@@ -1926,15 +2019,16 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
 		give_pages(rq, list);
 		return -ENOMEM;
 	}
-	p = page_address(first);
+
+	p = page_chain_get_dma(first);
 
 	/* rq->sg[0], rq->sg[1] share the same page */
 	/* a separated rq->sg[0] for header - required in case !any_header_sg */
-	sg_set_buf(&rq->sg[0], p, vi->hdr_len);
+	sg_fill_dma(&rq->sg[0], p, vi->hdr_len);
 
 	/* rq->sg[1] for data packet, from offset */
 	offset = sizeof(struct padded_vnet_hdr);
-	sg_set_buf(&rq->sg[1], p + offset, PAGE_SIZE - offset);
+	sg_fill_dma(&rq->sg[1], p + offset, PAGE_SIZE - offset);
 
 	/* chain first in list head */
 	page_chain_add(first, list);
@@ -2136,7 +2230,7 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
 		}
 	} else {
 		while (packets < budget &&
-		       (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
+		       (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
 			receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
 			packets++;
 		}
@@ -4257,8 +4351,7 @@ static void _free_receive_bufs(struct virtnet_info *vi)
 	int i;
 
 	for (i = 0; i < vi->max_queue_pairs; i++) {
-		while (vi->rq[i].pages)
-			__free_pages(get_a_page(&vi->rq[i], GFP_KERNEL), 0);
+		page_chain_release(&vi->rq[i]);
 
 		old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
 		RCU_INIT_POINTER(vi->rq[i].xdp_prog, NULL);
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH vhost v2 5/7] virtio_net: enable premapped by default
  2024-04-22  7:24 [PATCH vhost v2 0/7] virtio_net: rx enable premapped mode by default Xuan Zhuo
                   ` (3 preceding siblings ...)
  2024-04-22  7:24 ` [PATCH vhost v2 4/7] virtio_net: big mode support premapped Xuan Zhuo
@ 2024-04-22  7:24 ` Xuan Zhuo
  2024-04-22  7:24 ` [PATCH vhost v2 6/7] virtio_net: rx remove premapped failover code Xuan Zhuo
  2024-04-22  7:24 ` [PATCH vhost v2 7/7] virtio_net: remove the misleading comment Xuan Zhuo
  6 siblings, 0 replies; 20+ messages in thread
From: Xuan Zhuo @ 2024-04-22  7:24 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

Currently, big, merge, and small modes all support the premapped mode.
We can now enable premapped mode by default. Furthermore,
virtqueue_set_dma_premapped() must succeed when called immediately after
find_vqs(). Consequently, we can assume that premapped mode is always
enabled.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d4f5e65b247e..f04b10426b8f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -896,13 +896,9 @@ static void virtnet_rq_set_premapped(struct virtnet_info *vi)
 {
 	int i;
 
-	/* disable for big mode */
-	if (!vi->mergeable_rx_bufs && vi->big_packets)
-		return;
-
 	for (i = 0; i < vi->max_queue_pairs; i++) {
-		if (virtqueue_set_dma_premapped(vi->rq[i].vq))
-			continue;
+		/* error never happen */
+		BUG_ON(virtqueue_set_dma_premapped(vi->rq[i].vq));
 
 		vi->rq[i].do_dma = true;
 	}
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH vhost v2 6/7] virtio_net: rx remove premapped failover code
  2024-04-22  7:24 [PATCH vhost v2 0/7] virtio_net: rx enable premapped mode by default Xuan Zhuo
                   ` (4 preceding siblings ...)
  2024-04-22  7:24 ` [PATCH vhost v2 5/7] virtio_net: enable premapped by default Xuan Zhuo
@ 2024-04-22  7:24 ` Xuan Zhuo
  2024-04-22  7:24 ` [PATCH vhost v2 7/7] virtio_net: remove the misleading comment Xuan Zhuo
  6 siblings, 0 replies; 20+ messages in thread
From: Xuan Zhuo @ 2024-04-22  7:24 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

Now, the premapped mode can be enabled unconditionally.

So we can remove the failover code.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 81 ++++++++++++++++------------------------
 1 file changed, 33 insertions(+), 48 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f04b10426b8f..848e93ccf2ef 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -221,9 +221,6 @@ struct receive_queue {
 
 	/* Record the last dma info to free after new pages is allocated. */
 	struct virtnet_rq_dma *last_dma;
-
-	/* Do dma by self */
-	bool do_dma;
 };
 
 /* This structure can contain rss message with maximum settings for indirection table and keysize
@@ -803,7 +800,7 @@ static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
 	void *buf;
 
 	buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
-	if (buf && rq->do_dma)
+	if (buf)
 		virtnet_rq_unmap(rq, buf, *len);
 
 	return buf;
@@ -816,11 +813,6 @@ static void virtnet_rq_init_one_sg(struct receive_queue *rq, void *buf, u32 len)
 	u32 offset;
 	void *head;
 
-	if (!rq->do_dma) {
-		sg_init_one(rq->sg, buf, len);
-		return;
-	}
-
 	head = page_address(rq->alloc_frag.page);
 
 	offset = buf - head;
@@ -846,44 +838,42 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
 
 	head = page_address(alloc_frag->page);
 
-	if (rq->do_dma) {
-		dma = head;
-
-		/* new pages */
-		if (!alloc_frag->offset) {
-			if (rq->last_dma) {
-				/* Now, the new page is allocated, the last dma
-				 * will not be used. So the dma can be unmapped
-				 * if the ref is 0.
-				 */
-				virtnet_rq_unmap(rq, rq->last_dma, 0);
-				rq->last_dma = NULL;
-			}
+	dma = head;
 
-			dma->len = alloc_frag->size - sizeof(*dma);
+	/* new pages */
+	if (!alloc_frag->offset) {
+		if (rq->last_dma) {
+			/* Now, the new page is allocated, the last dma
+			 * will not be used. So the dma can be unmapped
+			 * if the ref is 0.
+			 */
+			virtnet_rq_unmap(rq, rq->last_dma, 0);
+			rq->last_dma = NULL;
+		}
 
-			addr = virtqueue_dma_map_single_attrs(rq->vq, dma + 1,
-							      dma->len, DMA_FROM_DEVICE, 0);
-			if (virtqueue_dma_mapping_error(rq->vq, addr))
-				return NULL;
+		dma->len = alloc_frag->size - sizeof(*dma);
 
-			dma->addr = addr;
-			dma->need_sync = virtqueue_dma_need_sync(rq->vq, addr);
+		addr = virtqueue_dma_map_single_attrs(rq->vq, dma + 1,
+						      dma->len, DMA_FROM_DEVICE, 0);
+		if (virtqueue_dma_mapping_error(rq->vq, addr))
+			return NULL;
 
-			/* Add a reference to dma to prevent the entire dma from
-			 * being released during error handling. This reference
-			 * will be freed after the pages are no longer used.
-			 */
-			get_page(alloc_frag->page);
-			dma->ref = 1;
-			alloc_frag->offset = sizeof(*dma);
+		dma->addr = addr;
+		dma->need_sync = virtqueue_dma_need_sync(rq->vq, addr);
 
-			rq->last_dma = dma;
-		}
+		/* Add a reference to dma to prevent the entire dma from
+		 * being released during error handling. This reference
+		 * will be freed after the pages are no longer used.
+		 */
+		get_page(alloc_frag->page);
+		dma->ref = 1;
+		alloc_frag->offset = sizeof(*dma);
 
-		++dma->ref;
+		rq->last_dma = dma;
 	}
 
+	++dma->ref;
+
 	buf = head + alloc_frag->offset;
 
 	get_page(alloc_frag->page);
@@ -896,12 +886,9 @@ static void virtnet_rq_set_premapped(struct virtnet_info *vi)
 {
 	int i;
 
-	for (i = 0; i < vi->max_queue_pairs; i++) {
+	for (i = 0; i < vi->max_queue_pairs; i++)
 		/* error never happen */
 		BUG_ON(virtqueue_set_dma_premapped(vi->rq[i].vq));
-
-		vi->rq[i].do_dma = true;
-	}
 }
 
 static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
@@ -1978,8 +1965,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
 
 	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
 	if (err < 0) {
-		if (rq->do_dma)
-			virtnet_rq_unmap(rq, buf, 0);
+		virtnet_rq_unmap(rq, buf, 0);
 		put_page(virt_to_head_page(buf));
 	}
 
@@ -2094,8 +2080,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
 	ctx = mergeable_len_to_ctx(len + room, headroom);
 	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
 	if (err < 0) {
-		if (rq->do_dma)
-			virtnet_rq_unmap(rq, buf, 0);
+		virtnet_rq_unmap(rq, buf, 0);
 		put_page(virt_to_head_page(buf));
 	}
 
@@ -4368,7 +4353,7 @@ static void free_receive_page_frags(struct virtnet_info *vi)
 	int i;
 	for (i = 0; i < vi->max_queue_pairs; i++)
 		if (vi->rq[i].alloc_frag.page) {
-			if (vi->rq[i].do_dma && vi->rq[i].last_dma)
+			if (vi->rq[i].last_dma)
 				virtnet_rq_unmap(&vi->rq[i], vi->rq[i].last_dma, 0);
 			put_page(vi->rq[i].alloc_frag.page);
 		}
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH vhost v2 7/7] virtio_net: remove the misleading comment
  2024-04-22  7:24 [PATCH vhost v2 0/7] virtio_net: rx enable premapped mode by default Xuan Zhuo
                   ` (5 preceding siblings ...)
  2024-04-22  7:24 ` [PATCH vhost v2 6/7] virtio_net: rx remove premapped failover code Xuan Zhuo
@ 2024-04-22  7:24 ` Xuan Zhuo
  2024-04-23  3:11   ` Jason Wang
  6 siblings, 1 reply; 20+ messages in thread
From: Xuan Zhuo @ 2024-04-22  7:24 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

We call the build_skb() actually without copying data.
The comment is misleading. So remove it.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 848e93ccf2ef..ae15254a673b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -690,7 +690,6 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 
 	shinfo_size = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 
-	/* copy small packet so we can reuse these pages */
 	if (!NET_IP_ALIGN && len > GOOD_COPY_LEN && tailroom >= shinfo_size) {
 		skb = virtnet_build_skb(buf, truesize, p - buf, len);
 		if (unlikely(!skb))
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH vhost v2 7/7] virtio_net: remove the misleading comment
  2024-04-22  7:24 ` [PATCH vhost v2 7/7] virtio_net: remove the misleading comment Xuan Zhuo
@ 2024-04-23  3:11   ` Jason Wang
  0 siblings, 0 replies; 20+ messages in thread
From: Jason Wang @ 2024-04-23  3:11 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> We call the build_skb() actually without copying data.
> The comment is misleading. So remove it.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks

> ---
>  drivers/net/virtio_net.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 848e93ccf2ef..ae15254a673b 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -690,7 +690,6 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>
>         shinfo_size = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>
> -       /* copy small packet so we can reuse these pages */
>         if (!NET_IP_ALIGN && len > GOOD_COPY_LEN && tailroom >= shinfo_size) {
>                 skb = virtnet_build_skb(buf, truesize, p - buf, len);
>                 if (unlikely(!skb))
> --
> 2.32.0.3.g01195cf9f
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH vhost v2 4/7] virtio_net: big mode support premapped
  2024-04-22  7:24 ` [PATCH vhost v2 4/7] virtio_net: big mode support premapped Xuan Zhuo
@ 2024-04-23  4:36   ` Jason Wang
  2024-04-23  5:47     ` Xuan Zhuo
  2024-04-23 12:31     ` Xuan Zhuo
  0 siblings, 2 replies; 20+ messages in thread
From: Jason Wang @ 2024-04-23  4:36 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> In big mode, pre-mapping DMA is beneficial because if the pages are not
> used, we can reuse them without needing to unmap and remap.
>
> We require space to store the DMA address. I use the page.dma_addr to
> store the DMA address from the pp structure inside the page.
>
> Every page retrieved from get_a_page() is mapped, and its DMA address is
> stored in page.dma_addr. When a page is returned to the chain, we check
> the DMA status; if it is not mapped (potentially having been unmapped),
> we remap it before returning it to the chain.
>
> Based on the following points, we do not use page pool to manage these
> pages:
>
> 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
>    we can only prevent the page pool from performing DMA operations, and
>    let the driver perform DMA operations on the allocated pages.
> 2. But when the page pool releases the page, we have no chance to
>    execute dma unmap.
> 3. A solution to #2 is to execute dma unmap every time before putting
>    the page back to the page pool. (This is actually a waste, we don't
>    execute unmap so frequently.)
> 4. But there is another problem, we still need to use page.dma_addr to
>    save the dma address. Using page.dma_addr while using page pool is
>    unsafe behavior.
>
> More:
>     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
>  1 file changed, 108 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 2c7a67ad4789..d4f5e65b247e 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
>         return (struct virtio_net_common_hdr *)skb->cb;
>  }
>
> +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> +{
> +       sg->dma_address = addr;
> +       sg->length = len;
> +}
> +
> +/* For pages submitted to the ring, we need to record its dma for unmap.
> + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> + * address.
> + */
> +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> +{
> +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {

Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.

> +               p->dma_addr = lower_32_bits(addr);
> +               p->pp_magic = upper_32_bits(addr);

And this uses three fields on page_pool which I'm not sure the other
maintainers are happy with. For example, re-using pp_maing might be
dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").

I think a more safe way is to reuse page pool, for example introducing
a new flag with dma callbacks?

Thanks


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH vhost v2 4/7] virtio_net: big mode support premapped
  2024-04-23  4:36   ` Jason Wang
@ 2024-04-23  5:47     ` Xuan Zhuo
  2024-04-23 12:31     ` Xuan Zhuo
  1 sibling, 0 replies; 20+ messages in thread
From: Xuan Zhuo @ 2024-04-23  5:47 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > used, we can reuse them without needing to unmap and remap.
> >
> > We require space to store the DMA address. I use the page.dma_addr to
> > store the DMA address from the pp structure inside the page.
> >
> > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > stored in page.dma_addr. When a page is returned to the chain, we check
> > the DMA status; if it is not mapped (potentially having been unmapped),
> > we remap it before returning it to the chain.
> >
> > Based on the following points, we do not use page pool to manage these
> > pages:
> >
> > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> >    we can only prevent the page pool from performing DMA operations, and
> >    let the driver perform DMA operations on the allocated pages.
> > 2. But when the page pool releases the page, we have no chance to
> >    execute dma unmap.
> > 3. A solution to #2 is to execute dma unmap every time before putting
> >    the page back to the page pool. (This is actually a waste, we don't
> >    execute unmap so frequently.)
> > 4. But there is another problem, we still need to use page.dma_addr to
> >    save the dma address. Using page.dma_addr while using page pool is
> >    unsafe behavior.
> >
> > More:
> >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 108 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 2c7a67ad4789..d4f5e65b247e 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> >         return (struct virtio_net_common_hdr *)skb->cb;
> >  }
> >
> > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > +{
> > +       sg->dma_address = addr;
> > +       sg->length = len;
> > +}
> > +
> > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > + * address.
> > + */
> > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > +{
> > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
>
> Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
>
> > +               p->dma_addr = lower_32_bits(addr);
> > +               p->pp_magic = upper_32_bits(addr);
>
> And this uses three fields on page_pool which I'm not sure the other
> maintainers are happy with. For example, re-using pp_maing might be
> dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
>
> I think a more safe way is to reuse page pool, for example introducing
> a new flag with dma callbacks?

Let me try.

Thanks.

>
> Thanks
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH vhost v2 4/7] virtio_net: big mode support premapped
  2024-04-23  4:36   ` Jason Wang
  2024-04-23  5:47     ` Xuan Zhuo
@ 2024-04-23 12:31     ` Xuan Zhuo
  2024-04-24  0:43       ` Jason Wang
  1 sibling, 1 reply; 20+ messages in thread
From: Xuan Zhuo @ 2024-04-23 12:31 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > used, we can reuse them without needing to unmap and remap.
> >
> > We require space to store the DMA address. I use the page.dma_addr to
> > store the DMA address from the pp structure inside the page.
> >
> > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > stored in page.dma_addr. When a page is returned to the chain, we check
> > the DMA status; if it is not mapped (potentially having been unmapped),
> > we remap it before returning it to the chain.
> >
> > Based on the following points, we do not use page pool to manage these
> > pages:
> >
> > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> >    we can only prevent the page pool from performing DMA operations, and
> >    let the driver perform DMA operations on the allocated pages.
> > 2. But when the page pool releases the page, we have no chance to
> >    execute dma unmap.
> > 3. A solution to #2 is to execute dma unmap every time before putting
> >    the page back to the page pool. (This is actually a waste, we don't
> >    execute unmap so frequently.)
> > 4. But there is another problem, we still need to use page.dma_addr to
> >    save the dma address. Using page.dma_addr while using page pool is
> >    unsafe behavior.
> >
> > More:
> >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 108 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 2c7a67ad4789..d4f5e65b247e 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> >         return (struct virtio_net_common_hdr *)skb->cb;
> >  }
> >
> > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > +{
> > +       sg->dma_address = addr;
> > +       sg->length = len;
> > +}
> > +
> > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > + * address.
> > + */
> > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > +{
> > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
>
> Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
>
> > +               p->dma_addr = lower_32_bits(addr);
> > +               p->pp_magic = upper_32_bits(addr);
>
> And this uses three fields on page_pool which I'm not sure the other
> maintainers are happy with. For example, re-using pp_maing might be
> dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
>
> I think a more safe way is to reuse page pool, for example introducing
> a new flag with dma callbacks?

If we use page pool, how can we chain the pages allocated for a packet?

Yon know the "private" can not be used.


If the pp struct inside the page is not safe, how about:

		struct {	/* Page cache and anonymous pages */
			/**
			 * @lru: Pageout list, eg. active_list protected by
			 * lruvec->lru_lock.  Sometimes used as a generic list
			 * by the page owner.
			 */
			union {
				struct list_head lru;

				/* Or, for the Unevictable "LRU list" slot */
				struct {
					/* Always even, to negate PageTail */
					void *__filler;
					/* Count page's or folio's mlocks */
					unsigned int mlock_count;
				};

				/* Or, free page */
				struct list_head buddy_list;
				struct list_head pcp_list;
			};
			/* See page-flags.h for PAGE_MAPPING_FLAGS */
			struct address_space *mapping;
			union {
				pgoff_t index;		/* Our offset within mapping. */
				unsigned long share;	/* share count for fsdax */
			};
			/**
			 * @private: Mapping-private opaque data.
			 * Usually used for buffer_heads if PagePrivate.
			 * Used for swp_entry_t if PageSwapCache.
			 * Indicates order in the buddy system if PageBuddy.
			 */
			unsigned long private;
		};

Or, we can map the private space of the page as a new structure.

Thanks.


>
> Thanks
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH vhost v2 4/7] virtio_net: big mode support premapped
  2024-04-23 12:31     ` Xuan Zhuo
@ 2024-04-24  0:43       ` Jason Wang
  2024-04-24  0:53         ` Xuan Zhuo
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Wang @ 2024-04-24  0:43 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Tue, Apr 23, 2024 at 8:38 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > used, we can reuse them without needing to unmap and remap.
> > >
> > > We require space to store the DMA address. I use the page.dma_addr to
> > > store the DMA address from the pp structure inside the page.
> > >
> > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > we remap it before returning it to the chain.
> > >
> > > Based on the following points, we do not use page pool to manage these
> > > pages:
> > >
> > > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> > >    we can only prevent the page pool from performing DMA operations, and
> > >    let the driver perform DMA operations on the allocated pages.
> > > 2. But when the page pool releases the page, we have no chance to
> > >    execute dma unmap.
> > > 3. A solution to #2 is to execute dma unmap every time before putting
> > >    the page back to the page pool. (This is actually a waste, we don't
> > >    execute unmap so frequently.)
> > > 4. But there is another problem, we still need to use page.dma_addr to
> > >    save the dma address. Using page.dma_addr while using page pool is
> > >    unsafe behavior.
> > >
> > > More:
> > >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> > >  1 file changed, 108 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 2c7a67ad4789..d4f5e65b247e 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > >         return (struct virtio_net_common_hdr *)skb->cb;
> > >  }
> > >
> > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > +{
> > > +       sg->dma_address = addr;
> > > +       sg->length = len;
> > > +}
> > > +
> > > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > > + * address.
> > > + */
> > > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > > +{
> > > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
> >
> > Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
> >
> > > +               p->dma_addr = lower_32_bits(addr);
> > > +               p->pp_magic = upper_32_bits(addr);
> >
> > And this uses three fields on page_pool which I'm not sure the other
> > maintainers are happy with. For example, re-using pp_maing might be
> > dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
> >
> > I think a more safe way is to reuse page pool, for example introducing
> > a new flag with dma callbacks?
>
> If we use page pool, how can we chain the pages allocated for a packet?

I'm not sure I get this, it is chained via the descriptor flag.

>
> Yon know the "private" can not be used.
>
>
> If the pp struct inside the page is not safe, how about:
>
>                 struct {        /* Page cache and anonymous pages */
>                         /**
>                          * @lru: Pageout list, eg. active_list protected by
>                          * lruvec->lru_lock.  Sometimes used as a generic list
>                          * by the page owner.
>                          */
>                         union {
>                                 struct list_head lru;
>
>                                 /* Or, for the Unevictable "LRU list" slot */
>                                 struct {
>                                         /* Always even, to negate PageTail */
>                                         void *__filler;
>                                         /* Count page's or folio's mlocks */
>                                         unsigned int mlock_count;
>                                 };
>
>                                 /* Or, free page */
>                                 struct list_head buddy_list;
>                                 struct list_head pcp_list;
>                         };
>                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
>                         struct address_space *mapping;
>                         union {
>                                 pgoff_t index;          /* Our offset within mapping. */
>                                 unsigned long share;    /* share count for fsdax */
>                         };
>                         /**
>                          * @private: Mapping-private opaque data.
>                          * Usually used for buffer_heads if PagePrivate.
>                          * Used for swp_entry_t if PageSwapCache.
>                          * Indicates order in the buddy system if PageBuddy.
>                          */
>                         unsigned long private;
>                 };
>
> Or, we can map the private space of the page as a new structure.

It could be a way. But such allocation might be huge if we are using
indirect descriptors or I may miss something.

Thanks

>
> Thanks.
>
>
> >
> > Thanks
> >
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH vhost v2 4/7] virtio_net: big mode support premapped
  2024-04-24  0:43       ` Jason Wang
@ 2024-04-24  0:53         ` Xuan Zhuo
  2024-04-24  2:34           ` Jason Wang
  0 siblings, 1 reply; 20+ messages in thread
From: Xuan Zhuo @ 2024-04-24  0:53 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Wed, 24 Apr 2024 08:43:21 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Tue, Apr 23, 2024 at 8:38 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > used, we can reuse them without needing to unmap and remap.
> > > >
> > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > store the DMA address from the pp structure inside the page.
> > > >
> > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > we remap it before returning it to the chain.
> > > >
> > > > Based on the following points, we do not use page pool to manage these
> > > > pages:
> > > >
> > > > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> > > >    we can only prevent the page pool from performing DMA operations, and
> > > >    let the driver perform DMA operations on the allocated pages.
> > > > 2. But when the page pool releases the page, we have no chance to
> > > >    execute dma unmap.
> > > > 3. A solution to #2 is to execute dma unmap every time before putting
> > > >    the page back to the page pool. (This is actually a waste, we don't
> > > >    execute unmap so frequently.)
> > > > 4. But there is another problem, we still need to use page.dma_addr to
> > > >    save the dma address. Using page.dma_addr while using page pool is
> > > >    unsafe behavior.
> > > >
> > > > More:
> > > >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> > > >  1 file changed, 108 insertions(+), 15 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 2c7a67ad4789..d4f5e65b247e 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > >  }
> > > >
> > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > +{
> > > > +       sg->dma_address = addr;
> > > > +       sg->length = len;
> > > > +}
> > > > +
> > > > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > > > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > > > + * address.
> > > > + */
> > > > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > > > +{
> > > > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
> > >
> > > Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
> > >
> > > > +               p->dma_addr = lower_32_bits(addr);
> > > > +               p->pp_magic = upper_32_bits(addr);
> > >
> > > And this uses three fields on page_pool which I'm not sure the other
> > > maintainers are happy with. For example, re-using pp_maing might be
> > > dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
> > >
> > > I think a more safe way is to reuse page pool, for example introducing
> > > a new flag with dma callbacks?
> >
> > If we use page pool, how can we chain the pages allocated for a packet?
>
> I'm not sure I get this, it is chained via the descriptor flag.


In the big mode, we will commit many pages to the virtio core by
virtqueue_add_inbuf().

By virtqueue_get_buf_ctx(), we got the data. That is the first page.
Other pages are chained by the "private".

If we use the page pool, how can we chain the pages.
After virtqueue_add_inbuf(), we need to get the pages to fill the skb.



>
> >
> > Yon know the "private" can not be used.
> >
> >
> > If the pp struct inside the page is not safe, how about:
> >
> >                 struct {        /* Page cache and anonymous pages */
> >                         /**
> >                          * @lru: Pageout list, eg. active_list protected by
> >                          * lruvec->lru_lock.  Sometimes used as a generic list
> >                          * by the page owner.
> >                          */
> >                         union {
> >                                 struct list_head lru;
> >
> >                                 /* Or, for the Unevictable "LRU list" slot */
> >                                 struct {
> >                                         /* Always even, to negate PageTail */
> >                                         void *__filler;
> >                                         /* Count page's or folio's mlocks */
> >                                         unsigned int mlock_count;
> >                                 };
> >
> >                                 /* Or, free page */
> >                                 struct list_head buddy_list;
> >                                 struct list_head pcp_list;
> >                         };
> >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> >                         struct address_space *mapping;
> >                         union {
> >                                 pgoff_t index;          /* Our offset within mapping. */
> >                                 unsigned long share;    /* share count for fsdax */
> >                         };
> >                         /**
> >                          * @private: Mapping-private opaque data.
> >                          * Usually used for buffer_heads if PagePrivate.
> >                          * Used for swp_entry_t if PageSwapCache.
> >                          * Indicates order in the buddy system if PageBuddy.
> >                          */
> >                         unsigned long private;
> >                 };
> >
> > Or, we can map the private space of the page as a new structure.
>
> It could be a way. But such allocation might be huge if we are using
> indirect descriptors or I may miss something.

No. we only need to store the "chain next" and the dma as this patch set did.
The size of the private space inside the page is  20(32bit)/40(64bit) bytes.
That is enough for us.

If you worry about the change of the pp structure, we can use the "private" as
origin and use the "struct list_head lru" to store the dma.

The min size of "struct list_head lru" is 8 bytes, that is enough for the
dma_addr_t.

We can do this more simper:

static void page_chain_set_dma(struct page *p, dma_addr_t dma)
{
	BUILD_BUG_ON(sizeof(p->lru)) < sizeof(dma));

	dma_addr_t *addr;

	addr = &page->lru;

	*addr = dma;
}

Thanks.



>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > Thanks
> > >
> >
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH vhost v2 4/7] virtio_net: big mode support premapped
  2024-04-24  0:53         ` Xuan Zhuo
@ 2024-04-24  2:34           ` Jason Wang
  2024-04-24  2:39             ` Xuan Zhuo
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Wang @ 2024-04-24  2:34 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Wed, Apr 24, 2024 at 9:10 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 24 Apr 2024 08:43:21 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Tue, Apr 23, 2024 at 8:38 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > > used, we can reuse them without needing to unmap and remap.
> > > > >
> > > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > > store the DMA address from the pp structure inside the page.
> > > > >
> > > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > > we remap it before returning it to the chain.
> > > > >
> > > > > Based on the following points, we do not use page pool to manage these
> > > > > pages:
> > > > >
> > > > > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> > > > >    we can only prevent the page pool from performing DMA operations, and
> > > > >    let the driver perform DMA operations on the allocated pages.
> > > > > 2. But when the page pool releases the page, we have no chance to
> > > > >    execute dma unmap.
> > > > > 3. A solution to #2 is to execute dma unmap every time before putting
> > > > >    the page back to the page pool. (This is actually a waste, we don't
> > > > >    execute unmap so frequently.)
> > > > > 4. But there is another problem, we still need to use page.dma_addr to
> > > > >    save the dma address. Using page.dma_addr while using page pool is
> > > > >    unsafe behavior.
> > > > >
> > > > > More:
> > > > >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> > > > >
> > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > ---
> > > > >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> > > > >  1 file changed, 108 insertions(+), 15 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index 2c7a67ad4789..d4f5e65b247e 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > > >  }
> > > > >
> > > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > > +{
> > > > > +       sg->dma_address = addr;
> > > > > +       sg->length = len;
> > > > > +}
> > > > > +
> > > > > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > > > > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > > > > + * address.
> > > > > + */
> > > > > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > > > > +{
> > > > > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
> > > >
> > > > Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
> > > >
> > > > > +               p->dma_addr = lower_32_bits(addr);
> > > > > +               p->pp_magic = upper_32_bits(addr);
> > > >
> > > > And this uses three fields on page_pool which I'm not sure the other
> > > > maintainers are happy with. For example, re-using pp_maing might be
> > > > dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
> > > >
> > > > I think a more safe way is to reuse page pool, for example introducing
> > > > a new flag with dma callbacks?
> > >
> > > If we use page pool, how can we chain the pages allocated for a packet?
> >
> > I'm not sure I get this, it is chained via the descriptor flag.
>
>
> In the big mode, we will commit many pages to the virtio core by
> virtqueue_add_inbuf().
>
> By virtqueue_get_buf_ctx(), we got the data. That is the first page.
> Other pages are chained by the "private".
>
> If we use the page pool, how can we chain the pages.
> After virtqueue_add_inbuf(), we need to get the pages to fill the skb.

Right, technically it could be solved by providing helpers in the
virtio core, but considering it's an optimization for big mode which
is not popular, it's not worth to bother.

>
>
>
> >
> > >
> > > Yon know the "private" can not be used.
> > >
> > >
> > > If the pp struct inside the page is not safe, how about:
> > >
> > >                 struct {        /* Page cache and anonymous pages */
> > >                         /**
> > >                          * @lru: Pageout list, eg. active_list protected by
> > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > >                          * by the page owner.
> > >                          */
> > >                         union {
> > >                                 struct list_head lru;
> > >
> > >                                 /* Or, for the Unevictable "LRU list" slot */
> > >                                 struct {
> > >                                         /* Always even, to negate PageTail */
> > >                                         void *__filler;
> > >                                         /* Count page's or folio's mlocks */
> > >                                         unsigned int mlock_count;
> > >                                 };
> > >
> > >                                 /* Or, free page */
> > >                                 struct list_head buddy_list;
> > >                                 struct list_head pcp_list;
> > >                         };
> > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > >                         struct address_space *mapping;
> > >                         union {
> > >                                 pgoff_t index;          /* Our offset within mapping. */
> > >                                 unsigned long share;    /* share count for fsdax */
> > >                         };
> > >                         /**
> > >                          * @private: Mapping-private opaque data.
> > >                          * Usually used for buffer_heads if PagePrivate.
> > >                          * Used for swp_entry_t if PageSwapCache.
> > >                          * Indicates order in the buddy system if PageBuddy.
> > >                          */
> > >                         unsigned long private;
> > >                 };
> > >
> > > Or, we can map the private space of the page as a new structure.
> >
> > It could be a way. But such allocation might be huge if we are using
> > indirect descriptors or I may miss something.
>
> No. we only need to store the "chain next" and the dma as this patch set did.
> The size of the private space inside the page is  20(32bit)/40(64bit) bytes.
> That is enough for us.
>
> If you worry about the change of the pp structure, we can use the "private" as
> origin and use the "struct list_head lru" to store the dma.

This looks even worse, as it uses fields belonging to the different
structures in the union.

>
> The min size of "struct list_head lru" is 8 bytes, that is enough for the
> dma_addr_t.
>
> We can do this more simper:
>
> static void page_chain_set_dma(struct page *p, dma_addr_t dma)
> {
>         BUILD_BUG_ON(sizeof(p->lru)) < sizeof(dma));
>
>         dma_addr_t *addr;
>
>         addr = &page->lru;
>
>         *addr = dma;
> }

So we had this in the kernel code.

       /*
         * Five words (20/40 bytes) are available in this union.
         * WARNING: bit 0 of the first word is used for PageTail(). That
         * means the other users of this union MUST NOT use the bit to
         * avoid collision and false-positive PageTail().
         */

And by looking at the discussion that introduces the pp_magic, reusing
fields seems to be tricky as we may end up with side effects of
aliasing fields in page structure. Technically, we can invent new
structures in the union, but it might not be worth it to bother.

So I think we can leave the fallback code and revisit this issue in the future.

Thanks

>
> Thanks.
>
>
>
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > >
> >
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH vhost v2 4/7] virtio_net: big mode support premapped
  2024-04-24  2:34           ` Jason Wang
@ 2024-04-24  2:39             ` Xuan Zhuo
  2024-04-24  2:45               ` Jason Wang
  0 siblings, 1 reply; 20+ messages in thread
From: Xuan Zhuo @ 2024-04-24  2:39 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Wed, 24 Apr 2024 10:34:56 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Wed, Apr 24, 2024 at 9:10 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Wed, 24 Apr 2024 08:43:21 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Tue, Apr 23, 2024 at 8:38 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > > > used, we can reuse them without needing to unmap and remap.
> > > > > >
> > > > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > > > store the DMA address from the pp structure inside the page.
> > > > > >
> > > > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > > > we remap it before returning it to the chain.
> > > > > >
> > > > > > Based on the following points, we do not use page pool to manage these
> > > > > > pages:
> > > > > >
> > > > > > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> > > > > >    we can only prevent the page pool from performing DMA operations, and
> > > > > >    let the driver perform DMA operations on the allocated pages.
> > > > > > 2. But when the page pool releases the page, we have no chance to
> > > > > >    execute dma unmap.
> > > > > > 3. A solution to #2 is to execute dma unmap every time before putting
> > > > > >    the page back to the page pool. (This is actually a waste, we don't
> > > > > >    execute unmap so frequently.)
> > > > > > 4. But there is another problem, we still need to use page.dma_addr to
> > > > > >    save the dma address. Using page.dma_addr while using page pool is
> > > > > >    unsafe behavior.
> > > > > >
> > > > > > More:
> > > > > >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> > > > > >
> > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > ---
> > > > > >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> > > > > >  1 file changed, 108 insertions(+), 15 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > index 2c7a67ad4789..d4f5e65b247e 100644
> > > > > > --- a/drivers/net/virtio_net.c
> > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > > > >  }
> > > > > >
> > > > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > > > +{
> > > > > > +       sg->dma_address = addr;
> > > > > > +       sg->length = len;
> > > > > > +}
> > > > > > +
> > > > > > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > > > > > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > > > > > + * address.
> > > > > > + */
> > > > > > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > > > > > +{
> > > > > > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
> > > > >
> > > > > Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
> > > > >
> > > > > > +               p->dma_addr = lower_32_bits(addr);
> > > > > > +               p->pp_magic = upper_32_bits(addr);
> > > > >
> > > > > And this uses three fields on page_pool which I'm not sure the other
> > > > > maintainers are happy with. For example, re-using pp_maing might be
> > > > > dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
> > > > >
> > > > > I think a more safe way is to reuse page pool, for example introducing
> > > > > a new flag with dma callbacks?
> > > >
> > > > If we use page pool, how can we chain the pages allocated for a packet?
> > >
> > > I'm not sure I get this, it is chained via the descriptor flag.
> >
> >
> > In the big mode, we will commit many pages to the virtio core by
> > virtqueue_add_inbuf().
> >
> > By virtqueue_get_buf_ctx(), we got the data. That is the first page.
> > Other pages are chained by the "private".
> >
> > If we use the page pool, how can we chain the pages.
> > After virtqueue_add_inbuf(), we need to get the pages to fill the skb.
>
> Right, technically it could be solved by providing helpers in the
> virtio core, but considering it's an optimization for big mode which
> is not popular, it's not worth to bother.
>
> >
> >
> >
> > >
> > > >
> > > > Yon know the "private" can not be used.
> > > >
> > > >
> > > > If the pp struct inside the page is not safe, how about:
> > > >
> > > >                 struct {        /* Page cache and anonymous pages */
> > > >                         /**
> > > >                          * @lru: Pageout list, eg. active_list protected by
> > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > >                          * by the page owner.
> > > >                          */
> > > >                         union {
> > > >                                 struct list_head lru;
> > > >
> > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > >                                 struct {
> > > >                                         /* Always even, to negate PageTail */
> > > >                                         void *__filler;
> > > >                                         /* Count page's or folio's mlocks */
> > > >                                         unsigned int mlock_count;
> > > >                                 };
> > > >
> > > >                                 /* Or, free page */
> > > >                                 struct list_head buddy_list;
> > > >                                 struct list_head pcp_list;
> > > >                         };
> > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > >                         struct address_space *mapping;
> > > >                         union {
> > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > >                                 unsigned long share;    /* share count for fsdax */
> > > >                         };
> > > >                         /**
> > > >                          * @private: Mapping-private opaque data.
> > > >                          * Usually used for buffer_heads if PagePrivate.
> > > >                          * Used for swp_entry_t if PageSwapCache.
> > > >                          * Indicates order in the buddy system if PageBuddy.
> > > >                          */
> > > >                         unsigned long private;
> > > >                 };
> > > >
> > > > Or, we can map the private space of the page as a new structure.
> > >
> > > It could be a way. But such allocation might be huge if we are using
> > > indirect descriptors or I may miss something.
> >
> > No. we only need to store the "chain next" and the dma as this patch set did.
> > The size of the private space inside the page is  20(32bit)/40(64bit) bytes.
> > That is enough for us.
> >
> > If you worry about the change of the pp structure, we can use the "private" as
> > origin and use the "struct list_head lru" to store the dma.
>
> This looks even worse, as it uses fields belonging to the different
> structures in the union.

I mean we do not use the elems from the pp structure inside the page,
if we worry the change of the pp structure.

I mean use the "private" and "lru", these in the same structure.

I think this is a good way.

Thanks.

>
> >
> > The min size of "struct list_head lru" is 8 bytes, that is enough for the
> > dma_addr_t.
> >
> > We can do this more simper:
> >
> > static void page_chain_set_dma(struct page *p, dma_addr_t dma)
> > {
> >         BUILD_BUG_ON(sizeof(p->lru)) < sizeof(dma));
> >
> >         dma_addr_t *addr;
> >
> >         addr = &page->lru;
> >
> >         *addr = dma;
> > }
>
> So we had this in the kernel code.
>
>        /*
>          * Five words (20/40 bytes) are available in this union.
>          * WARNING: bit 0 of the first word is used for PageTail(). That
>          * means the other users of this union MUST NOT use the bit to
>          * avoid collision and false-positive PageTail().
>          */
>
> And by looking at the discussion that introduces the pp_magic, reusing
> fields seems to be tricky as we may end up with side effects of
> aliasing fields in page structure. Technically, we can invent new
> structures in the union, but it might not be worth it to bother.
>
> So I think we can leave the fallback code and revisit this issue in the future.
>
> Thanks
>
> >
> > Thanks.
> >
> >
> >
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > >
> >
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH vhost v2 4/7] virtio_net: big mode support premapped
  2024-04-24  2:39             ` Xuan Zhuo
@ 2024-04-24  2:45               ` Jason Wang
  2024-04-24  2:54                 ` Xuan Zhuo
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Wang @ 2024-04-24  2:45 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Wed, Apr 24, 2024 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 24 Apr 2024 10:34:56 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Wed, Apr 24, 2024 at 9:10 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Wed, 24 Apr 2024 08:43:21 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Tue, Apr 23, 2024 at 8:38 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > > > > used, we can reuse them without needing to unmap and remap.
> > > > > > >
> > > > > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > > > > store the DMA address from the pp structure inside the page.
> > > > > > >
> > > > > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > > > > we remap it before returning it to the chain.
> > > > > > >
> > > > > > > Based on the following points, we do not use page pool to manage these
> > > > > > > pages:
> > > > > > >
> > > > > > > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> > > > > > >    we can only prevent the page pool from performing DMA operations, and
> > > > > > >    let the driver perform DMA operations on the allocated pages.
> > > > > > > 2. But when the page pool releases the page, we have no chance to
> > > > > > >    execute dma unmap.
> > > > > > > 3. A solution to #2 is to execute dma unmap every time before putting
> > > > > > >    the page back to the page pool. (This is actually a waste, we don't
> > > > > > >    execute unmap so frequently.)
> > > > > > > 4. But there is another problem, we still need to use page.dma_addr to
> > > > > > >    save the dma address. Using page.dma_addr while using page pool is
> > > > > > >    unsafe behavior.
> > > > > > >
> > > > > > > More:
> > > > > > >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> > > > > > >
> > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > ---
> > > > > > >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> > > > > > >  1 file changed, 108 insertions(+), 15 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > index 2c7a67ad4789..d4f5e65b247e 100644
> > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > > > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > > > > >  }
> > > > > > >
> > > > > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > > > > +{
> > > > > > > +       sg->dma_address = addr;
> > > > > > > +       sg->length = len;
> > > > > > > +}
> > > > > > > +
> > > > > > > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > > > > > > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > > > > > > + * address.
> > > > > > > + */
> > > > > > > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > > > > > > +{
> > > > > > > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
> > > > > >
> > > > > > Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
> > > > > >
> > > > > > > +               p->dma_addr = lower_32_bits(addr);
> > > > > > > +               p->pp_magic = upper_32_bits(addr);
> > > > > >
> > > > > > And this uses three fields on page_pool which I'm not sure the other
> > > > > > maintainers are happy with. For example, re-using pp_maing might be
> > > > > > dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
> > > > > >
> > > > > > I think a more safe way is to reuse page pool, for example introducing
> > > > > > a new flag with dma callbacks?
> > > > >
> > > > > If we use page pool, how can we chain the pages allocated for a packet?
> > > >
> > > > I'm not sure I get this, it is chained via the descriptor flag.
> > >
> > >
> > > In the big mode, we will commit many pages to the virtio core by
> > > virtqueue_add_inbuf().
> > >
> > > By virtqueue_get_buf_ctx(), we got the data. That is the first page.
> > > Other pages are chained by the "private".
> > >
> > > If we use the page pool, how can we chain the pages.
> > > After virtqueue_add_inbuf(), we need to get the pages to fill the skb.
> >
> > Right, technically it could be solved by providing helpers in the
> > virtio core, but considering it's an optimization for big mode which
> > is not popular, it's not worth to bother.
> >
> > >
> > >
> > >
> > > >
> > > > >
> > > > > Yon know the "private" can not be used.
> > > > >
> > > > >
> > > > > If the pp struct inside the page is not safe, how about:
> > > > >
> > > > >                 struct {        /* Page cache and anonymous pages */
> > > > >                         /**
> > > > >                          * @lru: Pageout list, eg. active_list protected by
> > > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > > >                          * by the page owner.
> > > > >                          */
> > > > >                         union {
> > > > >                                 struct list_head lru;
> > > > >
> > > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > > >                                 struct {
> > > > >                                         /* Always even, to negate PageTail */
> > > > >                                         void *__filler;
> > > > >                                         /* Count page's or folio's mlocks */
> > > > >                                         unsigned int mlock_count;
> > > > >                                 };
> > > > >
> > > > >                                 /* Or, free page */
> > > > >                                 struct list_head buddy_list;
> > > > >                                 struct list_head pcp_list;
> > > > >                         };
> > > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > > >                         struct address_space *mapping;
> > > > >                         union {
> > > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > > >                                 unsigned long share;    /* share count for fsdax */
> > > > >                         };
> > > > >                         /**
> > > > >                          * @private: Mapping-private opaque data.
> > > > >                          * Usually used for buffer_heads if PagePrivate.
> > > > >                          * Used for swp_entry_t if PageSwapCache.
> > > > >                          * Indicates order in the buddy system if PageBuddy.
> > > > >                          */
> > > > >                         unsigned long private;
> > > > >                 };
> > > > >
> > > > > Or, we can map the private space of the page as a new structure.
> > > >
> > > > It could be a way. But such allocation might be huge if we are using
> > > > indirect descriptors or I may miss something.
> > >
> > > No. we only need to store the "chain next" and the dma as this patch set did.
> > > The size of the private space inside the page is  20(32bit)/40(64bit) bytes.
> > > That is enough for us.
> > >
> > > If you worry about the change of the pp structure, we can use the "private" as
> > > origin and use the "struct list_head lru" to store the dma.
> >
> > This looks even worse, as it uses fields belonging to the different
> > structures in the union.
>
> I mean we do not use the elems from the pp structure inside the page,
> if we worry the change of the pp structure.
>
> I mean use the "private" and "lru", these in the same structure.
>
> I think this is a good way.
>
> Thanks.

See this:

https://lore.kernel.org/netdev/20210411114307.5087f958@carbon/

Thanks


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH vhost v2 4/7] virtio_net: big mode support premapped
  2024-04-24  2:45               ` Jason Wang
@ 2024-04-24  2:54                 ` Xuan Zhuo
  2024-04-24  3:50                   ` Jason Wang
  0 siblings, 1 reply; 20+ messages in thread
From: Xuan Zhuo @ 2024-04-24  2:54 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Wed, 24 Apr 2024 10:45:49 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Wed, Apr 24, 2024 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Wed, 24 Apr 2024 10:34:56 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Wed, Apr 24, 2024 at 9:10 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Wed, 24 Apr 2024 08:43:21 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Tue, Apr 23, 2024 at 8:38 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > > > > > used, we can reuse them without needing to unmap and remap.
> > > > > > > >
> > > > > > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > > > > > store the DMA address from the pp structure inside the page.
> > > > > > > >
> > > > > > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > > > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > > > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > > > > > we remap it before returning it to the chain.
> > > > > > > >
> > > > > > > > Based on the following points, we do not use page pool to manage these
> > > > > > > > pages:
> > > > > > > >
> > > > > > > > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> > > > > > > >    we can only prevent the page pool from performing DMA operations, and
> > > > > > > >    let the driver perform DMA operations on the allocated pages.
> > > > > > > > 2. But when the page pool releases the page, we have no chance to
> > > > > > > >    execute dma unmap.
> > > > > > > > 3. A solution to #2 is to execute dma unmap every time before putting
> > > > > > > >    the page back to the page pool. (This is actually a waste, we don't
> > > > > > > >    execute unmap so frequently.)
> > > > > > > > 4. But there is another problem, we still need to use page.dma_addr to
> > > > > > > >    save the dma address. Using page.dma_addr while using page pool is
> > > > > > > >    unsafe behavior.
> > > > > > > >
> > > > > > > > More:
> > > > > > > >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> > > > > > > >
> > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > ---
> > > > > > > >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> > > > > > > >  1 file changed, 108 insertions(+), 15 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > index 2c7a67ad4789..d4f5e65b247e 100644
> > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > > > > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > > > > > +{
> > > > > > > > +       sg->dma_address = addr;
> > > > > > > > +       sg->length = len;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > > > > > > > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > > > > > > > + * address.
> > > > > > > > + */
> > > > > > > > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > > > > > > > +{
> > > > > > > > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
> > > > > > >
> > > > > > > Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
> > > > > > >
> > > > > > > > +               p->dma_addr = lower_32_bits(addr);
> > > > > > > > +               p->pp_magic = upper_32_bits(addr);
> > > > > > >
> > > > > > > And this uses three fields on page_pool which I'm not sure the other
> > > > > > > maintainers are happy with. For example, re-using pp_maing might be
> > > > > > > dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
> > > > > > >
> > > > > > > I think a more safe way is to reuse page pool, for example introducing
> > > > > > > a new flag with dma callbacks?
> > > > > >
> > > > > > If we use page pool, how can we chain the pages allocated for a packet?
> > > > >
> > > > > I'm not sure I get this, it is chained via the descriptor flag.
> > > >
> > > >
> > > > In the big mode, we will commit many pages to the virtio core by
> > > > virtqueue_add_inbuf().
> > > >
> > > > By virtqueue_get_buf_ctx(), we got the data. That is the first page.
> > > > Other pages are chained by the "private".
> > > >
> > > > If we use the page pool, how can we chain the pages.
> > > > After virtqueue_add_inbuf(), we need to get the pages to fill the skb.
> > >
> > > Right, technically it could be solved by providing helpers in the
> > > virtio core, but considering it's an optimization for big mode which
> > > is not popular, it's not worth to bother.
> > >
> > > >
> > > >
> > > >
> > > > >
> > > > > >
> > > > > > Yon know the "private" can not be used.
> > > > > >
> > > > > >
> > > > > > If the pp struct inside the page is not safe, how about:
> > > > > >
> > > > > >                 struct {        /* Page cache and anonymous pages */
> > > > > >                         /**
> > > > > >                          * @lru: Pageout list, eg. active_list protected by
> > > > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > > > >                          * by the page owner.
> > > > > >                          */
> > > > > >                         union {
> > > > > >                                 struct list_head lru;
> > > > > >
> > > > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > > > >                                 struct {
> > > > > >                                         /* Always even, to negate PageTail */
> > > > > >                                         void *__filler;
> > > > > >                                         /* Count page's or folio's mlocks */
> > > > > >                                         unsigned int mlock_count;
> > > > > >                                 };
> > > > > >
> > > > > >                                 /* Or, free page */
> > > > > >                                 struct list_head buddy_list;
> > > > > >                                 struct list_head pcp_list;
> > > > > >                         };
> > > > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > > > >                         struct address_space *mapping;
> > > > > >                         union {
> > > > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > > > >                                 unsigned long share;    /* share count for fsdax */
> > > > > >                         };
> > > > > >                         /**
> > > > > >                          * @private: Mapping-private opaque data.
> > > > > >                          * Usually used for buffer_heads if PagePrivate.
> > > > > >                          * Used for swp_entry_t if PageSwapCache.
> > > > > >                          * Indicates order in the buddy system if PageBuddy.
> > > > > >                          */
> > > > > >                         unsigned long private;
> > > > > >                 };
> > > > > >
> > > > > > Or, we can map the private space of the page as a new structure.
> > > > >
> > > > > It could be a way. But such allocation might be huge if we are using
> > > > > indirect descriptors or I may miss something.
> > > >
> > > > No. we only need to store the "chain next" and the dma as this patch set did.
> > > > The size of the private space inside the page is  20(32bit)/40(64bit) bytes.
> > > > That is enough for us.
> > > >
> > > > If you worry about the change of the pp structure, we can use the "private" as
> > > > origin and use the "struct list_head lru" to store the dma.
> > >
> > > This looks even worse, as it uses fields belonging to the different
> > > structures in the union.
> >
> > I mean we do not use the elems from the pp structure inside the page,
> > if we worry the change of the pp structure.
> >
> > I mean use the "private" and "lru", these in the same structure.
> >
> > I think this is a good way.
> >
> > Thanks.
>
> See this:
>
> https://lore.kernel.org/netdev/20210411114307.5087f958@carbon/


I think that is because that the page pool will share the page with
the skbs.  I'm not entirely sure.

In our case, virtio-net fully owns the page. After the page is referenced by skb,
virtio-net no longer references the page. I don't think there is any problem
here.

The key is that who owns the page, who can use the page private space (20/40 bytes).

Is that?

Thanks.


>
> Thanks
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH vhost v2 4/7] virtio_net: big mode support premapped
  2024-04-24  2:54                 ` Xuan Zhuo
@ 2024-04-24  3:50                   ` Jason Wang
  2024-04-24  5:43                     ` Xuan Zhuo
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Wang @ 2024-04-24  3:50 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Wed, Apr 24, 2024 at 10:58 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 24 Apr 2024 10:45:49 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Wed, Apr 24, 2024 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Wed, 24 Apr 2024 10:34:56 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Wed, Apr 24, 2024 at 9:10 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Wed, 24 Apr 2024 08:43:21 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Tue, Apr 23, 2024 at 8:38 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > >
> > > > > > > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > > > > > > used, we can reuse them without needing to unmap and remap.
> > > > > > > > >
> > > > > > > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > > > > > > store the DMA address from the pp structure inside the page.
> > > > > > > > >
> > > > > > > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > > > > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > > > > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > > > > > > we remap it before returning it to the chain.
> > > > > > > > >
> > > > > > > > > Based on the following points, we do not use page pool to manage these
> > > > > > > > > pages:
> > > > > > > > >
> > > > > > > > > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> > > > > > > > >    we can only prevent the page pool from performing DMA operations, and
> > > > > > > > >    let the driver perform DMA operations on the allocated pages.
> > > > > > > > > 2. But when the page pool releases the page, we have no chance to
> > > > > > > > >    execute dma unmap.
> > > > > > > > > 3. A solution to #2 is to execute dma unmap every time before putting
> > > > > > > > >    the page back to the page pool. (This is actually a waste, we don't
> > > > > > > > >    execute unmap so frequently.)
> > > > > > > > > 4. But there is another problem, we still need to use page.dma_addr to
> > > > > > > > >    save the dma address. Using page.dma_addr while using page pool is
> > > > > > > > >    unsafe behavior.
> > > > > > > > >
> > > > > > > > > More:
> > > > > > > > >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> > > > > > > > >
> > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > ---
> > > > > > > > >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> > > > > > > > >  1 file changed, 108 insertions(+), 15 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > > index 2c7a67ad4789..d4f5e65b247e 100644
> > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > > > > > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > > > > > > +{
> > > > > > > > > +       sg->dma_address = addr;
> > > > > > > > > +       sg->length = len;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > > > > > > > > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > > > > > > > > + * address.
> > > > > > > > > + */
> > > > > > > > > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > > > > > > > > +{
> > > > > > > > > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
> > > > > > > >
> > > > > > > > Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
> > > > > > > >
> > > > > > > > > +               p->dma_addr = lower_32_bits(addr);
> > > > > > > > > +               p->pp_magic = upper_32_bits(addr);
> > > > > > > >
> > > > > > > > And this uses three fields on page_pool which I'm not sure the other
> > > > > > > > maintainers are happy with. For example, re-using pp_maing might be
> > > > > > > > dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
> > > > > > > >
> > > > > > > > I think a more safe way is to reuse page pool, for example introducing
> > > > > > > > a new flag with dma callbacks?
> > > > > > >
> > > > > > > If we use page pool, how can we chain the pages allocated for a packet?
> > > > > >
> > > > > > I'm not sure I get this, it is chained via the descriptor flag.
> > > > >
> > > > >
> > > > > In the big mode, we will commit many pages to the virtio core by
> > > > > virtqueue_add_inbuf().
> > > > >
> > > > > By virtqueue_get_buf_ctx(), we got the data. That is the first page.
> > > > > Other pages are chained by the "private".
> > > > >
> > > > > If we use the page pool, how can we chain the pages.
> > > > > After virtqueue_add_inbuf(), we need to get the pages to fill the skb.
> > > >
> > > > Right, technically it could be solved by providing helpers in the
> > > > virtio core, but considering it's an optimization for big mode which
> > > > is not popular, it's not worth to bother.
> > > >
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > Yon know the "private" can not be used.
> > > > > > >
> > > > > > >
> > > > > > > If the pp struct inside the page is not safe, how about:
> > > > > > >
> > > > > > >                 struct {        /* Page cache and anonymous pages */
> > > > > > >                         /**
> > > > > > >                          * @lru: Pageout list, eg. active_list protected by
> > > > > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > > > > >                          * by the page owner.
> > > > > > >                          */
> > > > > > >                         union {
> > > > > > >                                 struct list_head lru;
> > > > > > >
> > > > > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > > > > >                                 struct {
> > > > > > >                                         /* Always even, to negate PageTail */
> > > > > > >                                         void *__filler;
> > > > > > >                                         /* Count page's or folio's mlocks */
> > > > > > >                                         unsigned int mlock_count;
> > > > > > >                                 };
> > > > > > >
> > > > > > >                                 /* Or, free page */
> > > > > > >                                 struct list_head buddy_list;
> > > > > > >                                 struct list_head pcp_list;
> > > > > > >                         };
> > > > > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > > > > >                         struct address_space *mapping;
> > > > > > >                         union {
> > > > > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > > > > >                                 unsigned long share;    /* share count for fsdax */
> > > > > > >                         };
> > > > > > >                         /**
> > > > > > >                          * @private: Mapping-private opaque data.
> > > > > > >                          * Usually used for buffer_heads if PagePrivate.
> > > > > > >                          * Used for swp_entry_t if PageSwapCache.
> > > > > > >                          * Indicates order in the buddy system if PageBuddy.
> > > > > > >                          */
> > > > > > >                         unsigned long private;
> > > > > > >                 };
> > > > > > >
> > > > > > > Or, we can map the private space of the page as a new structure.
> > > > > >
> > > > > > It could be a way. But such allocation might be huge if we are using
> > > > > > indirect descriptors or I may miss something.
> > > > >
> > > > > No. we only need to store the "chain next" and the dma as this patch set did.
> > > > > The size of the private space inside the page is  20(32bit)/40(64bit) bytes.
> > > > > That is enough for us.
> > > > >
> > > > > If you worry about the change of the pp structure, we can use the "private" as
> > > > > origin and use the "struct list_head lru" to store the dma.
> > > >
> > > > This looks even worse, as it uses fields belonging to the different
> > > > structures in the union.
> > >
> > > I mean we do not use the elems from the pp structure inside the page,
> > > if we worry the change of the pp structure.
> > >
> > > I mean use the "private" and "lru", these in the same structure.
> > >
> > > I think this is a good way.
> > >
> > > Thanks.
> >
> > See this:
> >
> > https://lore.kernel.org/netdev/20210411114307.5087f958@carbon/
>
>
> I think that is because that the page pool will share the page with
> the skbs.  I'm not entirely sure.
>
> In our case, virtio-net fully owns the page. After the page is referenced by skb,
> virtio-net no longer references the page. I don't think there is any problem
> here.

Well, in the rx path, though the page is allocated by the virtio-net,
unlike the page pool, those pages are not freed by virtio-net. So it
may leave things in the page structure which is problematic. I don't
think we can introduce a virtio-net specific hook for kfree_skb() in
this case. That's why I think leveraging the page pool is better.

For reusing page pool. Maybe we can reuse __pp_mapping_pad for
virtio-net specific use cases like chaining, and clear it in
page_pool_clear_pp_info(). And we need to make sure we don't break
things like TCP RX zerocopy since mapping is aliasied with
__pp_mapping_pad at a first glance.

>
> The key is that who owns the page, who can use the page private space (20/40 bytes).
>
> Is that?

I'm not saying we can't investigate in this direction. But it needs
more comments from mm guys and we need to evaluate the price we pay
for that.

The motivation is to drop the fallback code when pre mapping is not
supported to improve the maintainability of the code and ease the
AF_XDP support for virtio-net. But it turns out to be not easy.

Considering the rx fallback code we need to maintain is not too huge,
maybe we can leave it as is, for example forbid AF_XDP in big modes.

Thanks

>
> Thanks.
>
>
> >
> > Thanks
> >
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH vhost v2 4/7] virtio_net: big mode support premapped
  2024-04-24  3:50                   ` Jason Wang
@ 2024-04-24  5:43                     ` Xuan Zhuo
  0 siblings, 0 replies; 20+ messages in thread
From: Xuan Zhuo @ 2024-04-24  5:43 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Wed, 24 Apr 2024 11:50:44 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Wed, Apr 24, 2024 at 10:58 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Wed, 24 Apr 2024 10:45:49 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Wed, Apr 24, 2024 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Wed, 24 Apr 2024 10:34:56 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Wed, Apr 24, 2024 at 9:10 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Wed, 24 Apr 2024 08:43:21 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Tue, Apr 23, 2024 at 8:38 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > >
> > > > > > > > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > > > > > > > used, we can reuse them without needing to unmap and remap.
> > > > > > > > > >
> > > > > > > > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > > > > > > > store the DMA address from the pp structure inside the page.
> > > > > > > > > >
> > > > > > > > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > > > > > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > > > > > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > > > > > > > we remap it before returning it to the chain.
> > > > > > > > > >
> > > > > > > > > > Based on the following points, we do not use page pool to manage these
> > > > > > > > > > pages:
> > > > > > > > > >
> > > > > > > > > > 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
> > > > > > > > > >    we can only prevent the page pool from performing DMA operations, and
> > > > > > > > > >    let the driver perform DMA operations on the allocated pages.
> > > > > > > > > > 2. But when the page pool releases the page, we have no chance to
> > > > > > > > > >    execute dma unmap.
> > > > > > > > > > 3. A solution to #2 is to execute dma unmap every time before putting
> > > > > > > > > >    the page back to the page pool. (This is actually a waste, we don't
> > > > > > > > > >    execute unmap so frequently.)
> > > > > > > > > > 4. But there is another problem, we still need to use page.dma_addr to
> > > > > > > > > >    save the dma address. Using page.dma_addr while using page pool is
> > > > > > > > > >    unsafe behavior.
> > > > > > > > > >
> > > > > > > > > > More:
> > > > > > > > > >     https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > > ---
> > > > > > > > > >  drivers/net/virtio_net.c | 123 ++++++++++++++++++++++++++++++++++-----
> > > > > > > > > >  1 file changed, 108 insertions(+), 15 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > > > index 2c7a67ad4789..d4f5e65b247e 100644
> > > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > > > > > > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > > > > > > > +{
> > > > > > > > > > +       sg->dma_address = addr;
> > > > > > > > > > +       sg->length = len;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +/* For pages submitted to the ring, we need to record its dma for unmap.
> > > > > > > > > > + * Here, we use the page.dma_addr and page.pp_magic to store the dma
> > > > > > > > > > + * address.
> > > > > > > > > > + */
> > > > > > > > > > +static void page_chain_set_dma(struct page *p, dma_addr_t addr)
> > > > > > > > > > +{
> > > > > > > > > > +       if (sizeof(dma_addr_t) > sizeof(unsigned long)) {
> > > > > > > > >
> > > > > > > > > Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA.
> > > > > > > > >
> > > > > > > > > > +               p->dma_addr = lower_32_bits(addr);
> > > > > > > > > > +               p->pp_magic = upper_32_bits(addr);
> > > > > > > > >
> > > > > > > > > And this uses three fields on page_pool which I'm not sure the other
> > > > > > > > > maintainers are happy with. For example, re-using pp_maing might be
> > > > > > > > > dangerous. See c07aea3ef4d40 ("mm: add a signature in struct page").
> > > > > > > > >
> > > > > > > > > I think a more safe way is to reuse page pool, for example introducing
> > > > > > > > > a new flag with dma callbacks?
> > > > > > > >
> > > > > > > > If we use page pool, how can we chain the pages allocated for a packet?
> > > > > > >
> > > > > > > I'm not sure I get this, it is chained via the descriptor flag.
> > > > > >
> > > > > >
> > > > > > In the big mode, we will commit many pages to the virtio core by
> > > > > > virtqueue_add_inbuf().
> > > > > >
> > > > > > By virtqueue_get_buf_ctx(), we got the data. That is the first page.
> > > > > > Other pages are chained by the "private".
> > > > > >
> > > > > > If we use the page pool, how can we chain the pages.
> > > > > > After virtqueue_add_inbuf(), we need to get the pages to fill the skb.
> > > > >
> > > > > Right, technically it could be solved by providing helpers in the
> > > > > virtio core, but considering it's an optimization for big mode which
> > > > > is not popular, it's not worth to bother.
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Yon know the "private" can not be used.
> > > > > > > >
> > > > > > > >
> > > > > > > > If the pp struct inside the page is not safe, how about:
> > > > > > > >
> > > > > > > >                 struct {        /* Page cache and anonymous pages */
> > > > > > > >                         /**
> > > > > > > >                          * @lru: Pageout list, eg. active_list protected by
> > > > > > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > > > > > >                          * by the page owner.
> > > > > > > >                          */
> > > > > > > >                         union {
> > > > > > > >                                 struct list_head lru;
> > > > > > > >
> > > > > > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > > > > > >                                 struct {
> > > > > > > >                                         /* Always even, to negate PageTail */
> > > > > > > >                                         void *__filler;
> > > > > > > >                                         /* Count page's or folio's mlocks */
> > > > > > > >                                         unsigned int mlock_count;
> > > > > > > >                                 };
> > > > > > > >
> > > > > > > >                                 /* Or, free page */
> > > > > > > >                                 struct list_head buddy_list;
> > > > > > > >                                 struct list_head pcp_list;
> > > > > > > >                         };
> > > > > > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > > > > > >                         struct address_space *mapping;
> > > > > > > >                         union {
> > > > > > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > > > > > >                                 unsigned long share;    /* share count for fsdax */
> > > > > > > >                         };
> > > > > > > >                         /**
> > > > > > > >                          * @private: Mapping-private opaque data.
> > > > > > > >                          * Usually used for buffer_heads if PagePrivate.
> > > > > > > >                          * Used for swp_entry_t if PageSwapCache.
> > > > > > > >                          * Indicates order in the buddy system if PageBuddy.
> > > > > > > >                          */
> > > > > > > >                         unsigned long private;
> > > > > > > >                 };
> > > > > > > >
> > > > > > > > Or, we can map the private space of the page as a new structure.
> > > > > > >
> > > > > > > It could be a way. But such allocation might be huge if we are using
> > > > > > > indirect descriptors or I may miss something.
> > > > > >
> > > > > > No. we only need to store the "chain next" and the dma as this patch set did.
> > > > > > The size of the private space inside the page is  20(32bit)/40(64bit) bytes.
> > > > > > That is enough for us.
> > > > > >
> > > > > > If you worry about the change of the pp structure, we can use the "private" as
> > > > > > origin and use the "struct list_head lru" to store the dma.
> > > > >
> > > > > This looks even worse, as it uses fields belonging to the different
> > > > > structures in the union.
> > > >
> > > > I mean we do not use the elems from the pp structure inside the page,
> > > > if we worry the change of the pp structure.
> > > >
> > > > I mean use the "private" and "lru", these in the same structure.
> > > >
> > > > I think this is a good way.
> > > >
> > > > Thanks.
> > >
> > > See this:
> > >
> > > https://lore.kernel.org/netdev/20210411114307.5087f958@carbon/
> >
> >
> > I think that is because that the page pool will share the page with
> > the skbs.  I'm not entirely sure.
> >
> > In our case, virtio-net fully owns the page. After the page is referenced by skb,
> > virtio-net no longer references the page. I don't think there is any problem
> > here.
>
> Well, in the rx path, though the page is allocated by the virtio-net,
> unlike the page pool, those pages are not freed by virtio-net. So it
> may leave things in the page structure which is problematic. I don't
> think we can introduce a virtio-net specific hook for kfree_skb() in
> this case. That's why I think leveraging the page pool is better.
>
> For reusing page pool. Maybe we can reuse __pp_mapping_pad for
> virtio-net specific use cases like chaining, and clear it in
> page_pool_clear_pp_info(). And we need to make sure we don't break
> things like TCP RX zerocopy since mapping is aliasied with
> __pp_mapping_pad at a first glance.
>
> >
> > The key is that who owns the page, who can use the page private space (20/40 bytes).
> >
> > Is that?
>
> I'm not saying we can't investigate in this direction. But it needs
> more comments from mm guys and we need to evaluate the price we pay
> for that.
>
> The motivation is to drop the fallback code when pre mapping is not
> supported to improve the maintainability of the code and ease the
> AF_XDP support for virtio-net. But it turns out to be not easy.
>
> Considering the rx fallback code we need to maintain is not too huge,
> maybe we can leave it as is, for example forbid AF_XDP in big modes.

I see.

Thanks.


>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > Thanks
> > >
> >
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-04-24  5:44 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-22  7:24 [PATCH vhost v2 0/7] virtio_net: rx enable premapped mode by default Xuan Zhuo
2024-04-22  7:24 ` [PATCH vhost v2 1/7] virtio_ring: introduce dma map api for page Xuan Zhuo
2024-04-22  7:24 ` [PATCH vhost v2 2/7] virtio_ring: enable premapped mode whatever use_dma_api Xuan Zhuo
2024-04-22  7:24 ` [PATCH vhost v2 3/7] virtio_net: replace private by pp struct inside page Xuan Zhuo
2024-04-22  7:24 ` [PATCH vhost v2 4/7] virtio_net: big mode support premapped Xuan Zhuo
2024-04-23  4:36   ` Jason Wang
2024-04-23  5:47     ` Xuan Zhuo
2024-04-23 12:31     ` Xuan Zhuo
2024-04-24  0:43       ` Jason Wang
2024-04-24  0:53         ` Xuan Zhuo
2024-04-24  2:34           ` Jason Wang
2024-04-24  2:39             ` Xuan Zhuo
2024-04-24  2:45               ` Jason Wang
2024-04-24  2:54                 ` Xuan Zhuo
2024-04-24  3:50                   ` Jason Wang
2024-04-24  5:43                     ` Xuan Zhuo
2024-04-22  7:24 ` [PATCH vhost v2 5/7] virtio_net: enable premapped by default Xuan Zhuo
2024-04-22  7:24 ` [PATCH vhost v2 6/7] virtio_net: rx remove premapped failover code Xuan Zhuo
2024-04-22  7:24 ` [PATCH vhost v2 7/7] virtio_net: remove the misleading comment Xuan Zhuo
2024-04-23  3:11   ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).