[PATCH vhost 0/6] virtio_net: rx enable premapped mode by default

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH vhost 0/6] virtio_net: rx enable premapped mode by default
@ 2024-04-11  2:51 Xuan Zhuo
  2024-04-11  2:51 ` [PATCH vhost 1/6] virtio_ring: introduce dma map api for page Xuan Zhuo
                   ` (5 more replies)
  0 siblings, 6 replies; 49+ messages in thread
From: Xuan Zhuo @ 2024-04-11  2:51 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

Actually, for the virtio drivers, we can enable premapped mode whatever
the value of use_dma_api. Because we provide the virtio dma apis.
So the driver can enable premapped mode unconditionally.

This patch set makes the big mode of virtio-net to support premapped mode.
And enable premapped mode for rx by default.

Please review.

Thanks.

Xuan Zhuo (6):
  virtio_ring: introduce dma map api for page
  virtio_ring: enable premapped mode whatever use_dma_api
  virtio_net: replace private by pp struct inside page
  virtio_net: big mode support premapped
  virtio_net: enable premapped by default
  virtio_net: rx remove premapped failover code

 drivers/net/virtio_net.c     | 213 ++++++++++++++++++++++-------------
 drivers/virtio/virtio_ring.c |  59 +++++++++-
 include/linux/virtio.h       |   7 ++
 3 files changed, 192 insertions(+), 87 deletions(-)

--
2.32.0.3.g01195cf9f


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH vhost 1/6] virtio_ring: introduce dma map api for page
  2024-04-11  2:51 [PATCH vhost 0/6] virtio_net: rx enable premapped mode by default Xuan Zhuo
@ 2024-04-11  2:51 ` Xuan Zhuo
  2024-04-11 11:45   ` Alexander Lobakin
  2024-04-18  6:08   ` Jason Wang
  2024-04-11  2:51 ` [PATCH vhost 2/6] virtio_ring: enable premapped mode whatever use_dma_api Xuan Zhuo
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 49+ messages in thread
From: Xuan Zhuo @ 2024-04-11  2:51 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

The virtio-net big mode sq will use these APIs to map the pages.

dma_addr_t virtqueue_dma_map_page_attrs(struct virtqueue *_vq, struct page *page,
                                       size_t offset, size_t size,
                                       enum dma_data_direction dir,
                                       unsigned long attrs);
void virtqueue_dma_unmap_page_attrs(struct virtqueue *_vq, dma_addr_t addr,
                                   size_t size, enum dma_data_direction dir,
                                   unsigned long attrs);

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 52 ++++++++++++++++++++++++++++++++++++
 include/linux/virtio.h       |  7 +++++
 2 files changed, 59 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 70de1a9a81a3..1b9fb680cff3 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -3100,6 +3100,58 @@ void virtqueue_dma_unmap_single_attrs(struct virtqueue *_vq, dma_addr_t addr,
 }
 EXPORT_SYMBOL_GPL(virtqueue_dma_unmap_single_attrs);
 
+/**
+ * virtqueue_dma_map_page_attrs - map DMA for _vq
+ * @_vq: the struct virtqueue we're talking about.
+ * @page: the page to do dma
+ * @offset: the offset inside the page
+ * @size: the size of the page to do dma
+ * @dir: DMA direction
+ * @attrs: DMA Attrs
+ *
+ * The caller calls this to do dma mapping in advance. The DMA address can be
+ * passed to this _vq when it is in pre-mapped mode.
+ *
+ * return DMA address. Caller should check that by virtqueue_dma_mapping_error().
+ */
+dma_addr_t virtqueue_dma_map_page_attrs(struct virtqueue *_vq, struct page *page,
+					size_t offset, size_t size,
+					enum dma_data_direction dir,
+					unsigned long attrs)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	if (!vq->use_dma_api)
+		return page_to_phys(page) + offset;
+
+	return dma_map_page_attrs(vring_dma_dev(vq), page, offset, size, dir, attrs);
+}
+EXPORT_SYMBOL_GPL(virtqueue_dma_map_page_attrs);
+
+/**
+ * virtqueue_dma_unmap_page_attrs - unmap DMA for _vq
+ * @_vq: the struct virtqueue we're talking about.
+ * @addr: the dma address to unmap
+ * @size: the size of the buffer
+ * @dir: DMA direction
+ * @attrs: DMA Attrs
+ *
+ * Unmap the address that is mapped by the virtqueue_dma_map_* APIs.
+ *
+ */
+void virtqueue_dma_unmap_page_attrs(struct virtqueue *_vq, dma_addr_t addr,
+				    size_t size, enum dma_data_direction dir,
+				    unsigned long attrs)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	if (!vq->use_dma_api)
+		return;
+
+	dma_unmap_page_attrs(vring_dma_dev(vq), addr, size, dir, attrs);
+}
+EXPORT_SYMBOL_GPL(virtqueue_dma_unmap_page_attrs);
+
 /**
  * virtqueue_dma_mapping_error - check dma address
  * @_vq: the struct virtqueue we're talking about.
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 26c4325aa373..d6c699553979 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -228,6 +228,13 @@ dma_addr_t virtqueue_dma_map_single_attrs(struct virtqueue *_vq, void *ptr, size
 void virtqueue_dma_unmap_single_attrs(struct virtqueue *_vq, dma_addr_t addr,
 				      size_t size, enum dma_data_direction dir,
 				      unsigned long attrs);
+dma_addr_t virtqueue_dma_map_page_attrs(struct virtqueue *_vq, struct page *page,
+					size_t offset, size_t size,
+					enum dma_data_direction dir,
+					unsigned long attrs);
+void virtqueue_dma_unmap_page_attrs(struct virtqueue *_vq, dma_addr_t addr,
+				    size_t size, enum dma_data_direction dir,
+				    unsigned long attrs);
 int virtqueue_dma_mapping_error(struct virtqueue *_vq, dma_addr_t addr);
 
 bool virtqueue_dma_need_sync(struct virtqueue *_vq, dma_addr_t addr);
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH vhost 2/6] virtio_ring: enable premapped mode whatever use_dma_api
  2024-04-11  2:51 [PATCH vhost 0/6] virtio_net: rx enable premapped mode by default Xuan Zhuo
  2024-04-11  2:51 ` [PATCH vhost 1/6] virtio_ring: introduce dma map api for page Xuan Zhuo
@ 2024-04-11  2:51 ` Xuan Zhuo
  2024-04-18  6:09   ` Jason Wang
  2024-04-18  6:13   ` Jason Wang
  2024-04-11  2:51 ` [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page Xuan Zhuo
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 49+ messages in thread
From: Xuan Zhuo @ 2024-04-11  2:51 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

Now, we have virtio DMA APIs, the driver can be the premapped
mode whatever the virtio core uses dma api or not.

So remove the limit of checking use_dma_api from
virtqueue_set_dma_premapped().

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 1b9fb680cff3..72c438c5f7d7 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2730,7 +2730,7 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
  *
  * Returns zero or a negative error.
  * 0: success.
- * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
+ * -EINVAL: NOT called immediately.
  */
 int virtqueue_set_dma_premapped(struct virtqueue *_vq)
 {
@@ -2746,11 +2746,6 @@ int virtqueue_set_dma_premapped(struct virtqueue *_vq)
 		return -EINVAL;
 	}
 
-	if (!vq->use_dma_api) {
-		END_USE(vq);
-		return -EINVAL;
-	}
-
 	vq->premapped = true;
 	vq->do_unmap = false;
 
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page
  2024-04-11  2:51 [PATCH vhost 0/6] virtio_net: rx enable premapped mode by default Xuan Zhuo
  2024-04-11  2:51 ` [PATCH vhost 1/6] virtio_ring: introduce dma map api for page Xuan Zhuo
  2024-04-11  2:51 ` [PATCH vhost 2/6] virtio_ring: enable premapped mode whatever use_dma_api Xuan Zhuo
@ 2024-04-11  2:51 ` Xuan Zhuo
  2024-04-12  4:47   ` Jason Wang
  2024-04-18  6:11   ` Jason Wang
  2024-04-11  2:51 ` [PATCH vhost 4/6] virtio_net: big mode support premapped Xuan Zhuo
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 49+ messages in thread
From: Xuan Zhuo @ 2024-04-11  2:51 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

Now, we chain the pages of big mode by the page's private variable.
But a subsequent patch aims to make the big mode to support
premapped mode. This requires additional space to store the dma addr.

Within the sub-struct that contains the 'private', there is no suitable
variable for storing the DMA addr.

		struct {	/* Page cache and anonymous pages */
			/**
			 * @lru: Pageout list, eg. active_list protected by
			 * lruvec->lru_lock.  Sometimes used as a generic list
			 * by the page owner.
			 */
			union {
				struct list_head lru;

				/* Or, for the Unevictable "LRU list" slot */
				struct {
					/* Always even, to negate PageTail */
					void *__filler;
					/* Count page's or folio's mlocks */
					unsigned int mlock_count;
				};

				/* Or, free page */
				struct list_head buddy_list;
				struct list_head pcp_list;
			};
			/* See page-flags.h for PAGE_MAPPING_FLAGS */
			struct address_space *mapping;
			union {
				pgoff_t index;		/* Our offset within mapping. */
				unsigned long share;	/* share count for fsdax */
			};
			/**
			 * @private: Mapping-private opaque data.
			 * Usually used for buffer_heads if PagePrivate.
			 * Used for swp_entry_t if PageSwapCache.
			 * Indicates order in the buddy system if PageBuddy.
			 */
			unsigned long private;
		};

But within the page pool struct, we have a variable called
dma_addr that is appropriate for storing dma addr.
And that struct is used by netstack. That works to our advantage.

		struct {	/* page_pool used by netstack */
			/**
			 * @pp_magic: magic value to avoid recycling non
			 * page_pool allocated pages.
			 */
			unsigned long pp_magic;
			struct page_pool *pp;
			unsigned long _pp_mapping_pad;
			unsigned long dma_addr;
			atomic_long_t pp_ref_count;
		};

On the other side, we should use variables from the same sub-struct.
So this patch replaces the "private" with "pp".

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c22d1118a133..4446fb54de6d 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -48,6 +48,9 @@ module_param(napi_tx, bool, 0644);
 
 #define VIRTIO_XDP_FLAG	BIT(0)
 
+#define page_chain_next(p)	((struct page *)((p)->pp))
+#define page_chain_add(p, n)	((p)->pp = (void *)n)
+
 /* RX packet size EWMA. The average packet size is used to determine the packet
  * buffer size when refilling RX rings. As the entire RX ring may be refilled
  * at once, the weight is chosen so that the EWMA will be insensitive to short-
@@ -191,7 +194,7 @@ struct receive_queue {
 
 	struct virtnet_interrupt_coalesce intr_coal;
 
-	/* Chain pages by the private ptr. */
+	/* Chain pages by the page's pp struct. */
 	struct page *pages;
 
 	/* Average packet length for mergeable receive buffers. */
@@ -432,16 +435,16 @@ skb_vnet_common_hdr(struct sk_buff *skb)
 }
 
 /*
- * private is used to chain pages for big packets, put the whole
- * most recent used list in the beginning for reuse
+ * put the whole most recent used list in the beginning for reuse
  */
 static void give_pages(struct receive_queue *rq, struct page *page)
 {
 	struct page *end;
 
 	/* Find end of list, sew whole thing into vi->rq.pages. */
-	for (end = page; end->private; end = (struct page *)end->private);
-	end->private = (unsigned long)rq->pages;
+	for (end = page; page_chain_next(end); end = page_chain_next(end));
+
+	page_chain_add(end, rq->pages);
 	rq->pages = page;
 }
 
@@ -450,9 +453,9 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
 	struct page *p = rq->pages;
 
 	if (p) {
-		rq->pages = (struct page *)p->private;
-		/* clear private here, it is used to chain pages */
-		p->private = 0;
+		rq->pages = page_chain_next(p);
+		/* clear chain here, it is used to chain pages */
+		page_chain_add(p, NULL);
 	} else
 		p = alloc_page(gfp_mask);
 	return p;
@@ -609,7 +612,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 		if (unlikely(!skb))
 			return NULL;
 
-		page = (struct page *)page->private;
+		page = page_chain_next(page);
 		if (page)
 			give_pages(rq, page);
 		goto ok;
@@ -657,7 +660,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 		skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page, offset,
 				frag_size, truesize);
 		len -= frag_size;
-		page = (struct page *)page->private;
+		page = page_chain_next(page);
 		offset = 0;
 	}
 
@@ -1909,7 +1912,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
 		sg_set_buf(&rq->sg[i], page_address(first), PAGE_SIZE);
 
 		/* chain new page in list head to match sg */
-		first->private = (unsigned long)list;
+		page_chain_add(first, list);
 		list = first;
 	}
 
@@ -1929,7 +1932,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
 	sg_set_buf(&rq->sg[1], p + offset, PAGE_SIZE - offset);
 
 	/* chain first in list head */
-	first->private = (unsigned long)list;
+	page_chain_add(first, list);
 	err = virtqueue_add_inbuf(rq->vq, rq->sg, vi->big_packets_num_skbfrags + 2,
 				  first, gfp);
 	if (err < 0)
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH vhost 4/6] virtio_net: big mode support premapped
  2024-04-11  2:51 [PATCH vhost 0/6] virtio_net: rx enable premapped mode by default Xuan Zhuo
                   ` (2 preceding siblings ...)
  2024-04-11  2:51 ` [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page Xuan Zhuo
@ 2024-04-11  2:51 ` Xuan Zhuo
  2024-04-11 16:34   ` kernel test robot
                     ` (3 more replies)
  2024-04-11  2:51 ` [PATCH vhost 5/6] virtio_net: enable premapped by default Xuan Zhuo
  2024-04-11  2:51 ` [PATCH vhost 6/6] virtio_net: rx remove premapped failover code Xuan Zhuo
  5 siblings, 4 replies; 49+ messages in thread
From: Xuan Zhuo @ 2024-04-11  2:51 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

In big mode, pre-mapping DMA is beneficial because if the pages are not
used, we can reuse them without needing to unmap and remap.

We require space to store the DMA address. I use the page.dma_addr to
store the DMA address from the pp structure inside the page.

Every page retrieved from get_a_page() is mapped, and its DMA address is
stored in page.dma_addr. When a page is returned to the chain, we check
the DMA status; if it is not mapped (potentially having been unmapped),
we remap it before returning it to the chain.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 98 +++++++++++++++++++++++++++++++++-------
 1 file changed, 81 insertions(+), 17 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 4446fb54de6d..7ea7e9bcd5d7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -50,6 +50,7 @@ module_param(napi_tx, bool, 0644);
 
 #define page_chain_next(p)	((struct page *)((p)->pp))
 #define page_chain_add(p, n)	((p)->pp = (void *)n)
+#define page_dma_addr(p)	((p)->dma_addr)
 
 /* RX packet size EWMA. The average packet size is used to determine the packet
  * buffer size when refilling RX rings. As the entire RX ring may be refilled
@@ -434,6 +435,46 @@ skb_vnet_common_hdr(struct sk_buff *skb)
 	return (struct virtio_net_common_hdr *)skb->cb;
 }
 
+static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
+{
+	sg->dma_address = addr;
+	sg->length = len;
+}
+
+static void page_chain_unmap(struct receive_queue *rq, struct page *p)
+{
+	virtqueue_dma_unmap_page_attrs(rq->vq, page_dma_addr(p), PAGE_SIZE,
+				       DMA_FROM_DEVICE, 0);
+
+	page_dma_addr(p) = DMA_MAPPING_ERROR;
+}
+
+static int page_chain_map(struct receive_queue *rq, struct page *p)
+{
+	dma_addr_t addr;
+
+	addr = virtqueue_dma_map_page_attrs(rq->vq, p, 0, PAGE_SIZE, DMA_FROM_DEVICE, 0);
+	if (virtqueue_dma_mapping_error(rq->vq, addr))
+		return -ENOMEM;
+
+	page_dma_addr(p) = addr;
+	return 0;
+}
+
+static void page_chain_release(struct receive_queue *rq)
+{
+	struct page *p, *n;
+
+	for (p = rq->pages; p; p = n) {
+		n = page_chain_next(p);
+
+		page_chain_unmap(rq, p);
+		__free_pages(p, 0);
+	}
+
+	rq->pages = NULL;
+}
+
 /*
  * put the whole most recent used list in the beginning for reuse
  */
@@ -441,6 +482,13 @@ static void give_pages(struct receive_queue *rq, struct page *page)
 {
 	struct page *end;
 
+	if (page_dma_addr(page) == DMA_MAPPING_ERROR) {
+		if (page_chain_map(rq, page)) {
+			__free_pages(page, 0);
+			return;
+		}
+	}
+
 	/* Find end of list, sew whole thing into vi->rq.pages. */
 	for (end = page; page_chain_next(end); end = page_chain_next(end));
 
@@ -456,8 +504,15 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
 		rq->pages = page_chain_next(p);
 		/* clear chain here, it is used to chain pages */
 		page_chain_add(p, NULL);
-	} else
+	} else {
 		p = alloc_page(gfp_mask);
+
+		if (page_chain_map(rq, p)) {
+			__free_pages(p, 0);
+			return NULL;
+		}
+	}
+
 	return p;
 }
 
@@ -613,8 +668,6 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 			return NULL;
 
 		page = page_chain_next(page);
-		if (page)
-			give_pages(rq, page);
 		goto ok;
 	}
 
@@ -640,6 +693,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 			skb_add_rx_frag(skb, 0, page, offset, len, truesize);
 		else
 			page_to_free = page;
+		page = NULL;
 		goto ok;
 	}
 
@@ -657,6 +711,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 	BUG_ON(offset >= PAGE_SIZE);
 	while (len) {
 		unsigned int frag_size = min((unsigned)PAGE_SIZE - offset, len);
+
+		/* unmap the page before using it. */
+		if (!offset)
+			page_chain_unmap(rq, page);
+
 		skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page, offset,
 				frag_size, truesize);
 		len -= frag_size;
@@ -664,15 +723,15 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 		offset = 0;
 	}
 
-	if (page)
-		give_pages(rq, page);
-
 ok:
 	hdr = skb_vnet_common_hdr(skb);
 	memcpy(hdr, hdr_p, hdr_len);
 	if (page_to_free)
 		put_page(page_to_free);
 
+	if (page)
+		give_pages(rq, page);
+
 	return skb;
 }
 
@@ -823,7 +882,8 @@ static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
 
 	rq = &vi->rq[i];
 
-	if (rq->do_dma)
+	/* Skip the unmap for big mode. */
+	if (!vi->big_packets || vi->mergeable_rx_bufs)
 		virtnet_rq_unmap(rq, buf, 0);
 
 	virtnet_rq_free_buf(vi, rq, buf);
@@ -1346,8 +1406,12 @@ static struct sk_buff *receive_big(struct net_device *dev,
 				   struct virtnet_rq_stats *stats)
 {
 	struct page *page = buf;
-	struct sk_buff *skb =
-		page_to_skb(vi, rq, page, 0, len, PAGE_SIZE, 0);
+	struct sk_buff *skb;
+
+	/* Unmap first page. The follow code may read this page. */
+	page_chain_unmap(rq, page);
+
+	skb = page_to_skb(vi, rq, page, 0, len, PAGE_SIZE, 0);
 
 	u64_stats_add(&stats->bytes, len - vi->hdr_len);
 	if (unlikely(!skb))
@@ -1896,7 +1960,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
 			   gfp_t gfp)
 {
 	struct page *first, *list = NULL;
-	char *p;
+	dma_addr_t p;
 	int i, err, offset;
 
 	sg_init_table(rq->sg, vi->big_packets_num_skbfrags + 2);
@@ -1909,7 +1973,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
 				give_pages(rq, list);
 			return -ENOMEM;
 		}
-		sg_set_buf(&rq->sg[i], page_address(first), PAGE_SIZE);
+		sg_fill_dma(&rq->sg[i], page_dma_addr(first), PAGE_SIZE);
 
 		/* chain new page in list head to match sg */
 		page_chain_add(first, list);
@@ -1921,15 +1985,16 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
 		give_pages(rq, list);
 		return -ENOMEM;
 	}
-	p = page_address(first);
+
+	p = page_dma_addr(first);
 
 	/* rq->sg[0], rq->sg[1] share the same page */
 	/* a separated rq->sg[0] for header - required in case !any_header_sg */
-	sg_set_buf(&rq->sg[0], p, vi->hdr_len);
+	sg_fill_dma(&rq->sg[0], p, vi->hdr_len);
 
 	/* rq->sg[1] for data packet, from offset */
 	offset = sizeof(struct padded_vnet_hdr);
-	sg_set_buf(&rq->sg[1], p + offset, PAGE_SIZE - offset);
+	sg_fill_dma(&rq->sg[1], p + offset, PAGE_SIZE - offset);
 
 	/* chain first in list head */
 	page_chain_add(first, list);
@@ -2131,7 +2196,7 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
 		}
 	} else {
 		while (packets < budget &&
-		       (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
+		       (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
 			receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
 			packets++;
 		}
@@ -4252,8 +4317,7 @@ static void _free_receive_bufs(struct virtnet_info *vi)
 	int i;
 
 	for (i = 0; i < vi->max_queue_pairs; i++) {
-		while (vi->rq[i].pages)
-			__free_pages(get_a_page(&vi->rq[i], GFP_KERNEL), 0);
+		page_chain_release(&vi->rq[i]);
 
 		old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
 		RCU_INIT_POINTER(vi->rq[i].xdp_prog, NULL);
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH vhost 5/6] virtio_net: enable premapped by default
  2024-04-11  2:51 [PATCH vhost 0/6] virtio_net: rx enable premapped mode by default Xuan Zhuo
                   ` (3 preceding siblings ...)
  2024-04-11  2:51 ` [PATCH vhost 4/6] virtio_net: big mode support premapped Xuan Zhuo
@ 2024-04-11  2:51 ` Xuan Zhuo
  2024-04-18  6:26   ` Jason Wang
  2024-04-11  2:51 ` [PATCH vhost 6/6] virtio_net: rx remove premapped failover code Xuan Zhuo
  5 siblings, 1 reply; 49+ messages in thread
From: Xuan Zhuo @ 2024-04-11  2:51 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

Currently, big, merge, and small modes all support the premapped mode.
We can now enable premapped mode by default. Furthermore,
virtqueue_set_dma_premapped() must succeed when called immediately after
find_vqs(). Consequently, we can assume that premapped mode is always
enabled.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 7ea7e9bcd5d7..f0faf7c0fe59 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -860,15 +860,13 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
 
 static void virtnet_rq_set_premapped(struct virtnet_info *vi)
 {
-	int i;
-
-	/* disable for big mode */
-	if (!vi->mergeable_rx_bufs && vi->big_packets)
-		return;
+	int i, err;
 
 	for (i = 0; i < vi->max_queue_pairs; i++) {
-		if (virtqueue_set_dma_premapped(vi->rq[i].vq))
-			continue;
+		err = virtqueue_set_dma_premapped(vi->rq[i].vq);
+
+		/* never happen */
+		BUG_ON(err);
 
 		vi->rq[i].do_dma = true;
 	}
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH vhost 6/6] virtio_net: rx remove premapped failover code
  2024-04-11  2:51 [PATCH vhost 0/6] virtio_net: rx enable premapped mode by default Xuan Zhuo
                   ` (4 preceding siblings ...)
  2024-04-11  2:51 ` [PATCH vhost 5/6] virtio_net: enable premapped by default Xuan Zhuo
@ 2024-04-11  2:51 ` Xuan Zhuo
  2024-04-18  6:31   ` Jason Wang
  5 siblings, 1 reply; 49+ messages in thread
From: Xuan Zhuo @ 2024-04-11  2:51 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

Now, for the merge and small, the premapped mode can be enabled
unconditionally.

So we can remove the failover code.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 78 +++++++++++++++++-----------------------
 1 file changed, 32 insertions(+), 46 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f0faf7c0fe59..493e2fccd7b2 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -217,9 +217,6 @@ struct receive_queue {
 
 	/* Record the last dma info to free after new pages is allocated. */
 	struct virtnet_rq_dma *last_dma;
-
-	/* Do dma by self */
-	bool do_dma;
 };
 
 /* This structure can contain rss message with maximum settings for indirection table and keysize
@@ -769,7 +766,7 @@ static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
 	void *buf;
 
 	buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
-	if (buf && rq->do_dma)
+	if (buf)
 		virtnet_rq_unmap(rq, buf, *len);
 
 	return buf;
@@ -782,11 +779,6 @@ static void virtnet_rq_init_one_sg(struct receive_queue *rq, void *buf, u32 len)
 	u32 offset;
 	void *head;
 
-	if (!rq->do_dma) {
-		sg_init_one(rq->sg, buf, len);
-		return;
-	}
-
 	head = page_address(rq->alloc_frag.page);
 
 	offset = buf - head;
@@ -812,44 +804,42 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
 
 	head = page_address(alloc_frag->page);
 
-	if (rq->do_dma) {
-		dma = head;
-
-		/* new pages */
-		if (!alloc_frag->offset) {
-			if (rq->last_dma) {
-				/* Now, the new page is allocated, the last dma
-				 * will not be used. So the dma can be unmapped
-				 * if the ref is 0.
-				 */
-				virtnet_rq_unmap(rq, rq->last_dma, 0);
-				rq->last_dma = NULL;
-			}
+	dma = head;
 
-			dma->len = alloc_frag->size - sizeof(*dma);
+	/* new pages */
+	if (!alloc_frag->offset) {
+		if (rq->last_dma) {
+			/* Now, the new page is allocated, the last dma
+			 * will not be used. So the dma can be unmapped
+			 * if the ref is 0.
+			 */
+			virtnet_rq_unmap(rq, rq->last_dma, 0);
+			rq->last_dma = NULL;
+		}
 
-			addr = virtqueue_dma_map_single_attrs(rq->vq, dma + 1,
-							      dma->len, DMA_FROM_DEVICE, 0);
-			if (virtqueue_dma_mapping_error(rq->vq, addr))
-				return NULL;
+		dma->len = alloc_frag->size - sizeof(*dma);
 
-			dma->addr = addr;
-			dma->need_sync = virtqueue_dma_need_sync(rq->vq, addr);
+		addr = virtqueue_dma_map_single_attrs(rq->vq, dma + 1,
+						      dma->len, DMA_FROM_DEVICE, 0);
+		if (virtqueue_dma_mapping_error(rq->vq, addr))
+			return NULL;
 
-			/* Add a reference to dma to prevent the entire dma from
-			 * being released during error handling. This reference
-			 * will be freed after the pages are no longer used.
-			 */
-			get_page(alloc_frag->page);
-			dma->ref = 1;
-			alloc_frag->offset = sizeof(*dma);
+		dma->addr = addr;
+		dma->need_sync = virtqueue_dma_need_sync(rq->vq, addr);
 
-			rq->last_dma = dma;
-		}
+		/* Add a reference to dma to prevent the entire dma from
+		 * being released during error handling. This reference
+		 * will be freed after the pages are no longer used.
+		 */
+		get_page(alloc_frag->page);
+		dma->ref = 1;
+		alloc_frag->offset = sizeof(*dma);
 
-		++dma->ref;
+		rq->last_dma = dma;
 	}
 
+	++dma->ref;
+
 	buf = head + alloc_frag->offset;
 
 	get_page(alloc_frag->page);
@@ -867,8 +857,6 @@ static void virtnet_rq_set_premapped(struct virtnet_info *vi)
 
 		/* never happen */
 		BUG_ON(err);
-
-		vi->rq[i].do_dma = true;
 	}
 }
 
@@ -1946,8 +1934,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
 
 	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
 	if (err < 0) {
-		if (rq->do_dma)
-			virtnet_rq_unmap(rq, buf, 0);
+		virtnet_rq_unmap(rq, buf, 0);
 		put_page(virt_to_head_page(buf));
 	}
 
@@ -2062,8 +2049,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
 	ctx = mergeable_len_to_ctx(len + room, headroom);
 	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
 	if (err < 0) {
-		if (rq->do_dma)
-			virtnet_rq_unmap(rq, buf, 0);
+		virtnet_rq_unmap(rq, buf, 0);
 		put_page(virt_to_head_page(buf));
 	}
 
@@ -4336,7 +4322,7 @@ static void free_receive_page_frags(struct virtnet_info *vi)
 	int i;
 	for (i = 0; i < vi->max_queue_pairs; i++)
 		if (vi->rq[i].alloc_frag.page) {
-			if (vi->rq[i].do_dma && vi->rq[i].last_dma)
+			if (vi->rq[i].last_dma)
 				virtnet_rq_unmap(&vi->rq[i], vi->rq[i].last_dma, 0);
 			put_page(vi->rq[i].alloc_frag.page);
 		}
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 1/6] virtio_ring: introduce dma map api for page
  2024-04-11  2:51 ` [PATCH vhost 1/6] virtio_ring: introduce dma map api for page Xuan Zhuo
@ 2024-04-11 11:45   ` Alexander Lobakin
  2024-04-12  3:48     ` Xuan Zhuo
  2024-04-18  6:08   ` Jason Wang
  1 sibling, 1 reply; 49+ messages in thread
From: Alexander Lobakin @ 2024-04-11 11:45 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, Jason Wang, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date: Thu, 11 Apr 2024 10:51:22 +0800

> The virtio-net big mode sq will use these APIs to map the pages.
> 
> dma_addr_t virtqueue_dma_map_page_attrs(struct virtqueue *_vq, struct page *page,
>                                        size_t offset, size_t size,
>                                        enum dma_data_direction dir,
>                                        unsigned long attrs);
> void virtqueue_dma_unmap_page_attrs(struct virtqueue *_vq, dma_addr_t addr,
>                                    size_t size, enum dma_data_direction dir,
>                                    unsigned long attrs);
> 
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 52 ++++++++++++++++++++++++++++++++++++
>  include/linux/virtio.h       |  7 +++++
>  2 files changed, 59 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 70de1a9a81a3..1b9fb680cff3 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -3100,6 +3100,58 @@ void virtqueue_dma_unmap_single_attrs(struct virtqueue *_vq, dma_addr_t addr,
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_dma_unmap_single_attrs);
>  
> +/**
> + * virtqueue_dma_map_page_attrs - map DMA for _vq
> + * @_vq: the struct virtqueue we're talking about.
> + * @page: the page to do dma
> + * @offset: the offset inside the page
> + * @size: the size of the page to do dma
> + * @dir: DMA direction
> + * @attrs: DMA Attrs
> + *
> + * The caller calls this to do dma mapping in advance. The DMA address can be
> + * passed to this _vq when it is in pre-mapped mode.
> + *
> + * return DMA address. Caller should check that by virtqueue_dma_mapping_error().
> + */
> +dma_addr_t virtqueue_dma_map_page_attrs(struct virtqueue *_vq, struct page *page,
> +					size_t offset, size_t size,
> +					enum dma_data_direction dir,
> +					unsigned long attrs)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	if (!vq->use_dma_api)
> +		return page_to_phys(page) + offset;

page_to_phys() and the actual page DMA address may differ. See
page_to_dma()/virt_to_dma(). I believe this is not correct.

> +
> +	return dma_map_page_attrs(vring_dma_dev(vq), page, offset, size, dir, attrs);
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_dma_map_page_attrs);

Could you try make these functions static inlines and run bloat-o-meter?
They seem to be small and probably you'd get better performance.

> +
> +/**
> + * virtqueue_dma_unmap_page_attrs - unmap DMA for _vq
> + * @_vq: the struct virtqueue we're talking about.
> + * @addr: the dma address to unmap
> + * @size: the size of the buffer
> + * @dir: DMA direction
> + * @attrs: DMA Attrs
> + *
> + * Unmap the address that is mapped by the virtqueue_dma_map_* APIs.
> + *
> + */
> +void virtqueue_dma_unmap_page_attrs(struct virtqueue *_vq, dma_addr_t addr,
> +				    size_t size, enum dma_data_direction dir,
> +				    unsigned long attrs)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	if (!vq->use_dma_api)
> +		return;
> +
> +	dma_unmap_page_attrs(vring_dma_dev(vq), addr, size, dir, attrs);
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_dma_unmap_page_attrs);
> +
>  /**
>   * virtqueue_dma_mapping_error - check dma address
>   * @_vq: the struct virtqueue we're talking about.
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 26c4325aa373..d6c699553979 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -228,6 +228,13 @@ dma_addr_t virtqueue_dma_map_single_attrs(struct virtqueue *_vq, void *ptr, size
>  void virtqueue_dma_unmap_single_attrs(struct virtqueue *_vq, dma_addr_t addr,
>  				      size_t size, enum dma_data_direction dir,
>  				      unsigned long attrs);
> +dma_addr_t virtqueue_dma_map_page_attrs(struct virtqueue *_vq, struct page *page,
> +					size_t offset, size_t size,
> +					enum dma_data_direction dir,
> +					unsigned long attrs);
> +void virtqueue_dma_unmap_page_attrs(struct virtqueue *_vq, dma_addr_t addr,
> +				    size_t size, enum dma_data_direction dir,
> +				    unsigned long attrs);
>  int virtqueue_dma_mapping_error(struct virtqueue *_vq, dma_addr_t addr);
>  
>  bool virtqueue_dma_need_sync(struct virtqueue *_vq, dma_addr_t addr);

Thanks,
Olek

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 4/6] virtio_net: big mode support premapped
  2024-04-11  2:51 ` [PATCH vhost 4/6] virtio_net: big mode support premapped Xuan Zhuo
@ 2024-04-11 16:34   ` kernel test robot
  2024-04-11 20:11   ` kernel test robot
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 49+ messages in thread
From: kernel test robot @ 2024-04-11 16:34 UTC (permalink / raw)
  To: Xuan Zhuo, virtualization
  Cc: llvm, oe-kbuild-all, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

Hi Xuan,

kernel test robot noticed the following build warnings:

[auto build test WARNING on mst-vhost/linux-next]
[also build test WARNING on linus/master v6.9-rc3 next-20240411]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Xuan-Zhuo/virtio_ring-introduce-dma-map-api-for-page/20240411-105318
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
patch link:    https://lore.kernel.org/r/20240411025127.51945-5-xuanzhuo%40linux.alibaba.com
patch subject: [PATCH vhost 4/6] virtio_net: big mode support premapped
config: i386-randconfig-016-20240411 (https://download.01.org/0day-ci/archive/20240412/202404120044.VKtjHMzy-lkp@intel.com/config)
compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240412/202404120044.VKtjHMzy-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202404120044.VKtjHMzy-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/net/virtio_net.c:449:21: warning: implicit conversion from 'dma_addr_t' (aka 'unsigned long long') to 'unsigned long' changes value from 18446744073709551615 to 4294967295 [-Wconstant-conversion]
     449 |         page_dma_addr(p) = DMA_MAPPING_ERROR;
         |                          ~ ^~~~~~~~~~~~~~~~~
   include/linux/dma-mapping.h:75:29: note: expanded from macro 'DMA_MAPPING_ERROR'
      75 | #define DMA_MAPPING_ERROR               (~(dma_addr_t)0)
         |                                          ^~~~~~~~~~~~~~
>> drivers/net/virtio_net.c:485:26: warning: result of comparison of constant 18446744073709551615 with expression of type 'unsigned long' is always false [-Wtautological-constant-out-of-range-compare]
     485 |         if (page_dma_addr(page) == DMA_MAPPING_ERROR) {
         |             ~~~~~~~~~~~~~~~~~~~ ^  ~~~~~~~~~~~~~~~~~
   2 warnings generated.


vim +449 drivers/net/virtio_net.c

   443	
   444	static void page_chain_unmap(struct receive_queue *rq, struct page *p)
   445	{
   446		virtqueue_dma_unmap_page_attrs(rq->vq, page_dma_addr(p), PAGE_SIZE,
   447					       DMA_FROM_DEVICE, 0);
   448	
 > 449		page_dma_addr(p) = DMA_MAPPING_ERROR;
   450	}
   451	
   452	static int page_chain_map(struct receive_queue *rq, struct page *p)
   453	{
   454		dma_addr_t addr;
   455	
   456		addr = virtqueue_dma_map_page_attrs(rq->vq, p, 0, PAGE_SIZE, DMA_FROM_DEVICE, 0);
   457		if (virtqueue_dma_mapping_error(rq->vq, addr))
   458			return -ENOMEM;
   459	
   460		page_dma_addr(p) = addr;
   461		return 0;
   462	}
   463	
   464	static void page_chain_release(struct receive_queue *rq)
   465	{
   466		struct page *p, *n;
   467	
   468		for (p = rq->pages; p; p = n) {
   469			n = page_chain_next(p);
   470	
   471			page_chain_unmap(rq, p);
   472			__free_pages(p, 0);
   473		}
   474	
   475		rq->pages = NULL;
   476	}
   477	
   478	/*
   479	 * put the whole most recent used list in the beginning for reuse
   480	 */
   481	static void give_pages(struct receive_queue *rq, struct page *page)
   482	{
   483		struct page *end;
   484	
 > 485		if (page_dma_addr(page) == DMA_MAPPING_ERROR) {
   486			if (page_chain_map(rq, page)) {
   487				__free_pages(page, 0);
   488				return;
   489			}
   490		}
   491	
   492		/* Find end of list, sew whole thing into vi->rq.pages. */
   493		for (end = page; page_chain_next(end); end = page_chain_next(end));
   494	
   495		page_chain_add(end, rq->pages);
   496		rq->pages = page;
   497	}
   498	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 4/6] virtio_net: big mode support premapped
  2024-04-11  2:51 ` [PATCH vhost 4/6] virtio_net: big mode support premapped Xuan Zhuo
  2024-04-11 16:34   ` kernel test robot
@ 2024-04-11 20:11   ` kernel test robot
  2024-04-14  9:48   ` Dan Carpenter
  2024-04-18  6:25   ` Jason Wang
  3 siblings, 0 replies; 49+ messages in thread
From: kernel test robot @ 2024-04-11 20:11 UTC (permalink / raw)
  To: Xuan Zhuo, virtualization
  Cc: oe-kbuild-all, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

Hi Xuan,

kernel test robot noticed the following build warnings:

[auto build test WARNING on mst-vhost/linux-next]
[also build test WARNING on linus/master v6.9-rc3 next-20240411]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Xuan-Zhuo/virtio_ring-introduce-dma-map-api-for-page/20240411-105318
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
patch link:    https://lore.kernel.org/r/20240411025127.51945-5-xuanzhuo%40linux.alibaba.com
patch subject: [PATCH vhost 4/6] virtio_net: big mode support premapped
config: i386-randconfig-062-20240411 (https://download.01.org/0day-ci/archive/20240412/202404120417.VUAT9H5b-lkp@intel.com/config)
compiler: gcc-13 (Ubuntu 13.2.0-4ubuntu3) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240412/202404120417.VUAT9H5b-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202404120417.VUAT9H5b-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from include/linux/skbuff.h:28,
                    from include/net/net_namespace.h:43,
                    from include/linux/netdevice.h:38,
                    from drivers/net/virtio_net.c:7:
   drivers/net/virtio_net.c: In function 'page_chain_unmap':
>> include/linux/dma-mapping.h:75:41: warning: conversion from 'long long unsigned int' to 'long unsigned int' changes value from '18446744073709551615' to '4294967295' [-Woverflow]
      75 | #define DMA_MAPPING_ERROR               (~(dma_addr_t)0)
         |                                         ^
   drivers/net/virtio_net.c:449:28: note: in expansion of macro 'DMA_MAPPING_ERROR'
     449 |         page_dma_addr(p) = DMA_MAPPING_ERROR;
         |                            ^~~~~~~~~~~~~~~~~


vim +75 include/linux/dma-mapping.h

b2fb366425ceb8 Mitchel Humpherys 2017-01-06  64  
eba304c6861613 Christoph Hellwig 2020-09-11  65  /*
eba304c6861613 Christoph Hellwig 2020-09-11  66   * A dma_addr_t can hold any valid DMA or bus address for the platform.  It can
eba304c6861613 Christoph Hellwig 2020-09-11  67   * be given to a device to use as a DMA source or target.  It is specific to a
eba304c6861613 Christoph Hellwig 2020-09-11  68   * given device and there may be a translation between the CPU physical address
eba304c6861613 Christoph Hellwig 2020-09-11  69   * space and the bus address space.
eba304c6861613 Christoph Hellwig 2020-09-11  70   *
eba304c6861613 Christoph Hellwig 2020-09-11  71   * DMA_MAPPING_ERROR is the magic error code if a mapping failed.  It should not
eba304c6861613 Christoph Hellwig 2020-09-11  72   * be used directly in drivers, but checked for using dma_mapping_error()
eba304c6861613 Christoph Hellwig 2020-09-11  73   * instead.
eba304c6861613 Christoph Hellwig 2020-09-11  74   */
42ee3cae0ed38b Christoph Hellwig 2018-11-21 @75  #define DMA_MAPPING_ERROR		(~(dma_addr_t)0)
42ee3cae0ed38b Christoph Hellwig 2018-11-21  76  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 1/6] virtio_ring: introduce dma map api for page
  2024-04-11 11:45   ` Alexander Lobakin
@ 2024-04-12  3:48     ` Xuan Zhuo
  0 siblings, 0 replies; 49+ messages in thread
From: Xuan Zhuo @ 2024-04-12  3:48 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: virtualization, Michael S. Tsirkin, Jason Wang, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Thu, 11 Apr 2024 13:45:28 +0200, Alexander Lobakin <aleksander.lobakin@intel.com> wrote:
> From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Date: Thu, 11 Apr 2024 10:51:22 +0800
>
> > The virtio-net big mode sq will use these APIs to map the pages.
> >
> > dma_addr_t virtqueue_dma_map_page_attrs(struct virtqueue *_vq, struct page *page,
> >                                        size_t offset, size_t size,
> >                                        enum dma_data_direction dir,
> >                                        unsigned long attrs);
> > void virtqueue_dma_unmap_page_attrs(struct virtqueue *_vq, dma_addr_t addr,
> >                                    size_t size, enum dma_data_direction dir,
> >                                    unsigned long attrs);
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 52 ++++++++++++++++++++++++++++++++++++
> >  include/linux/virtio.h       |  7 +++++
> >  2 files changed, 59 insertions(+)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 70de1a9a81a3..1b9fb680cff3 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -3100,6 +3100,58 @@ void virtqueue_dma_unmap_single_attrs(struct virtqueue *_vq, dma_addr_t addr,
> >  }
> >  EXPORT_SYMBOL_GPL(virtqueue_dma_unmap_single_attrs);
> >
> > +/**
> > + * virtqueue_dma_map_page_attrs - map DMA for _vq
> > + * @_vq: the struct virtqueue we're talking about.
> > + * @page: the page to do dma
> > + * @offset: the offset inside the page
> > + * @size: the size of the page to do dma
> > + * @dir: DMA direction
> > + * @attrs: DMA Attrs
> > + *
> > + * The caller calls this to do dma mapping in advance. The DMA address can be
> > + * passed to this _vq when it is in pre-mapped mode.
> > + *
> > + * return DMA address. Caller should check that by virtqueue_dma_mapping_error().
> > + */
> > +dma_addr_t virtqueue_dma_map_page_attrs(struct virtqueue *_vq, struct page *page,
> > +					size_t offset, size_t size,
> > +					enum dma_data_direction dir,
> > +					unsigned long attrs)
> > +{
> > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > +
> > +	if (!vq->use_dma_api)
> > +		return page_to_phys(page) + offset;
>
> page_to_phys() and the actual page DMA address may differ. See
> page_to_dma()/virt_to_dma(). I believe this is not correct.


For the virtio, if use_dma_api is false, we do not try to get the
dma address. We try to get the physical address.


>
> > +
> > +	return dma_map_page_attrs(vring_dma_dev(vq), page, offset, size, dir, attrs);
> > +}
> > +EXPORT_SYMBOL_GPL(virtqueue_dma_map_page_attrs);
>
> Could you try make these functions static inlines and run bloat-o-meter?
> They seem to be small and probably you'd get better performance.

YES.

But struct vring_virtqueue is in the .c file, we must do that with moving
structure to the .h file.

I plan to do that in the future.

Thanks



>
> > +
> > +/**
> > + * virtqueue_dma_unmap_page_attrs - unmap DMA for _vq
> > + * @_vq: the struct virtqueue we're talking about.
> > + * @addr: the dma address to unmap
> > + * @size: the size of the buffer
> > + * @dir: DMA direction
> > + * @attrs: DMA Attrs
> > + *
> > + * Unmap the address that is mapped by the virtqueue_dma_map_* APIs.
> > + *
> > + */
> > +void virtqueue_dma_unmap_page_attrs(struct virtqueue *_vq, dma_addr_t addr,
> > +				    size_t size, enum dma_data_direction dir,
> > +				    unsigned long attrs)
> > +{
> > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > +
> > +	if (!vq->use_dma_api)
> > +		return;
> > +
> > +	dma_unmap_page_attrs(vring_dma_dev(vq), addr, size, dir, attrs);
> > +}
> > +EXPORT_SYMBOL_GPL(virtqueue_dma_unmap_page_attrs);
> > +
> >  /**
> >   * virtqueue_dma_mapping_error - check dma address
> >   * @_vq: the struct virtqueue we're talking about.
> > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > index 26c4325aa373..d6c699553979 100644
> > --- a/include/linux/virtio.h
> > +++ b/include/linux/virtio.h
> > @@ -228,6 +228,13 @@ dma_addr_t virtqueue_dma_map_single_attrs(struct virtqueue *_vq, void *ptr, size
> >  void virtqueue_dma_unmap_single_attrs(struct virtqueue *_vq, dma_addr_t addr,
> >  				      size_t size, enum dma_data_direction dir,
> >  				      unsigned long attrs);
> > +dma_addr_t virtqueue_dma_map_page_attrs(struct virtqueue *_vq, struct page *page,
> > +					size_t offset, size_t size,
> > +					enum dma_data_direction dir,
> > +					unsigned long attrs);
> > +void virtqueue_dma_unmap_page_attrs(struct virtqueue *_vq, dma_addr_t addr,
> > +				    size_t size, enum dma_data_direction dir,
> > +				    unsigned long attrs);
> >  int virtqueue_dma_mapping_error(struct virtqueue *_vq, dma_addr_t addr);
> >
> >  bool virtqueue_dma_need_sync(struct virtqueue *_vq, dma_addr_t addr);
>
> Thanks,
> Olek

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page
  2024-04-11  2:51 ` [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page Xuan Zhuo
@ 2024-04-12  4:47   ` Jason Wang
  2024-04-12  5:35     ` Xuan Zhuo
  2024-04-18  6:11   ` Jason Wang
  1 sibling, 1 reply; 49+ messages in thread
From: Jason Wang @ 2024-04-12  4:47 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Now, we chain the pages of big mode by the page's private variable.
> But a subsequent patch aims to make the big mode to support
> premapped mode. This requires additional space to store the dma addr.
>
> Within the sub-struct that contains the 'private', there is no suitable
> variable for storing the DMA addr.
>
>                 struct {        /* Page cache and anonymous pages */
>                         /**
>                          * @lru: Pageout list, eg. active_list protected by
>                          * lruvec->lru_lock.  Sometimes used as a generic list
>                          * by the page owner.
>                          */
>                         union {
>                                 struct list_head lru;
>
>                                 /* Or, for the Unevictable "LRU list" slot */
>                                 struct {
>                                         /* Always even, to negate PageTail */
>                                         void *__filler;
>                                         /* Count page's or folio's mlocks */
>                                         unsigned int mlock_count;
>                                 };
>
>                                 /* Or, free page */
>                                 struct list_head buddy_list;
>                                 struct list_head pcp_list;
>                         };
>                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
>                         struct address_space *mapping;
>                         union {
>                                 pgoff_t index;          /* Our offset within mapping. */
>                                 unsigned long share;    /* share count for fsdax */
>                         };
>                         /**
>                          * @private: Mapping-private opaque data.
>                          * Usually used for buffer_heads if PagePrivate.
>                          * Used for swp_entry_t if PageSwapCache.
>                          * Indicates order in the buddy system if PageBuddy.
>                          */
>                         unsigned long private;
>                 };
>
> But within the page pool struct, we have a variable called
> dma_addr that is appropriate for storing dma addr.
> And that struct is used by netstack. That works to our advantage.
>
>                 struct {        /* page_pool used by netstack */
>                         /**
>                          * @pp_magic: magic value to avoid recycling non
>                          * page_pool allocated pages.
>                          */
>                         unsigned long pp_magic;
>                         struct page_pool *pp;
>                         unsigned long _pp_mapping_pad;
>                         unsigned long dma_addr;
>                         atomic_long_t pp_ref_count;
>                 };
>
> On the other side, we should use variables from the same sub-struct.
> So this patch replaces the "private" with "pp".
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---

Instead of doing a customized version of page pool, can we simply
switch to use page pool for big mode instead? Then we don't need to
bother the dma stuffs.

Thanks


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page
  2024-04-12  4:47   ` Jason Wang
@ 2024-04-12  5:35     ` Xuan Zhuo
  2024-04-12  5:49       ` Jason Wang
  0 siblings, 1 reply; 49+ messages in thread
From: Xuan Zhuo @ 2024-04-12  5:35 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > Now, we chain the pages of big mode by the page's private variable.
> > But a subsequent patch aims to make the big mode to support
> > premapped mode. This requires additional space to store the dma addr.
> >
> > Within the sub-struct that contains the 'private', there is no suitable
> > variable for storing the DMA addr.
> >
> >                 struct {        /* Page cache and anonymous pages */
> >                         /**
> >                          * @lru: Pageout list, eg. active_list protected by
> >                          * lruvec->lru_lock.  Sometimes used as a generic list
> >                          * by the page owner.
> >                          */
> >                         union {
> >                                 struct list_head lru;
> >
> >                                 /* Or, for the Unevictable "LRU list" slot */
> >                                 struct {
> >                                         /* Always even, to negate PageTail */
> >                                         void *__filler;
> >                                         /* Count page's or folio's mlocks */
> >                                         unsigned int mlock_count;
> >                                 };
> >
> >                                 /* Or, free page */
> >                                 struct list_head buddy_list;
> >                                 struct list_head pcp_list;
> >                         };
> >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> >                         struct address_space *mapping;
> >                         union {
> >                                 pgoff_t index;          /* Our offset within mapping. */
> >                                 unsigned long share;    /* share count for fsdax */
> >                         };
> >                         /**
> >                          * @private: Mapping-private opaque data.
> >                          * Usually used for buffer_heads if PagePrivate.
> >                          * Used for swp_entry_t if PageSwapCache.
> >                          * Indicates order in the buddy system if PageBuddy.
> >                          */
> >                         unsigned long private;
> >                 };
> >
> > But within the page pool struct, we have a variable called
> > dma_addr that is appropriate for storing dma addr.
> > And that struct is used by netstack. That works to our advantage.
> >
> >                 struct {        /* page_pool used by netstack */
> >                         /**
> >                          * @pp_magic: magic value to avoid recycling non
> >                          * page_pool allocated pages.
> >                          */
> >                         unsigned long pp_magic;
> >                         struct page_pool *pp;
> >                         unsigned long _pp_mapping_pad;
> >                         unsigned long dma_addr;
> >                         atomic_long_t pp_ref_count;
> >                 };
> >
> > On the other side, we should use variables from the same sub-struct.
> > So this patch replaces the "private" with "pp".
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
>
> Instead of doing a customized version of page pool, can we simply
> switch to use page pool for big mode instead? Then we don't need to
> bother the dma stuffs.


The page pool needs to do the dma by the DMA APIs.
So we can not use the page pool directly.

Thanks.


>
> Thanks
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page
  2024-04-12  5:35     ` Xuan Zhuo
@ 2024-04-12  5:49       ` Jason Wang
  2024-04-12  6:02         ` Xuan Zhuo
  2024-04-15  2:08         ` Xuan Zhuo
  0 siblings, 2 replies; 49+ messages in thread
From: Jason Wang @ 2024-04-12  5:49 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Fri, Apr 12, 2024 at 1:39 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > Now, we chain the pages of big mode by the page's private variable.
> > > But a subsequent patch aims to make the big mode to support
> > > premapped mode. This requires additional space to store the dma addr.
> > >
> > > Within the sub-struct that contains the 'private', there is no suitable
> > > variable for storing the DMA addr.
> > >
> > >                 struct {        /* Page cache and anonymous pages */
> > >                         /**
> > >                          * @lru: Pageout list, eg. active_list protected by
> > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > >                          * by the page owner.
> > >                          */
> > >                         union {
> > >                                 struct list_head lru;
> > >
> > >                                 /* Or, for the Unevictable "LRU list" slot */
> > >                                 struct {
> > >                                         /* Always even, to negate PageTail */
> > >                                         void *__filler;
> > >                                         /* Count page's or folio's mlocks */
> > >                                         unsigned int mlock_count;
> > >                                 };
> > >
> > >                                 /* Or, free page */
> > >                                 struct list_head buddy_list;
> > >                                 struct list_head pcp_list;
> > >                         };
> > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > >                         struct address_space *mapping;
> > >                         union {
> > >                                 pgoff_t index;          /* Our offset within mapping. */
> > >                                 unsigned long share;    /* share count for fsdax */
> > >                         };
> > >                         /**
> > >                          * @private: Mapping-private opaque data.
> > >                          * Usually used for buffer_heads if PagePrivate.
> > >                          * Used for swp_entry_t if PageSwapCache.
> > >                          * Indicates order in the buddy system if PageBuddy.
> > >                          */
> > >                         unsigned long private;
> > >                 };
> > >
> > > But within the page pool struct, we have a variable called
> > > dma_addr that is appropriate for storing dma addr.
> > > And that struct is used by netstack. That works to our advantage.
> > >
> > >                 struct {        /* page_pool used by netstack */
> > >                         /**
> > >                          * @pp_magic: magic value to avoid recycling non
> > >                          * page_pool allocated pages.
> > >                          */
> > >                         unsigned long pp_magic;
> > >                         struct page_pool *pp;
> > >                         unsigned long _pp_mapping_pad;
> > >                         unsigned long dma_addr;
> > >                         atomic_long_t pp_ref_count;
> > >                 };
> > >
> > > On the other side, we should use variables from the same sub-struct.
> > > So this patch replaces the "private" with "pp".
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> >
> > Instead of doing a customized version of page pool, can we simply
> > switch to use page pool for big mode instead? Then we don't need to
> > bother the dma stuffs.
>
>
> The page pool needs to do the dma by the DMA APIs.
> So we can not use the page pool directly.

I found this:

define PP_FLAG_DMA_MAP         BIT(0) /* Should page_pool do the DMA
                                        * map/unmap

It seems to work here?

Thanks

>
> Thanks.
>
>
> >
> > Thanks
> >
>


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page
  2024-04-12  5:49       ` Jason Wang
@ 2024-04-12  6:02         ` Xuan Zhuo
  2024-04-15  2:08         ` Xuan Zhuo
  1 sibling, 0 replies; 49+ messages in thread
From: Xuan Zhuo @ 2024-04-12  6:02 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Fri, 12 Apr 2024 13:49:12 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Fri, Apr 12, 2024 at 1:39 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > Now, we chain the pages of big mode by the page's private variable.
> > > > But a subsequent patch aims to make the big mode to support
> > > > premapped mode. This requires additional space to store the dma addr.
> > > >
> > > > Within the sub-struct that contains the 'private', there is no suitable
> > > > variable for storing the DMA addr.
> > > >
> > > >                 struct {        /* Page cache and anonymous pages */
> > > >                         /**
> > > >                          * @lru: Pageout list, eg. active_list protected by
> > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > >                          * by the page owner.
> > > >                          */
> > > >                         union {
> > > >                                 struct list_head lru;
> > > >
> > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > >                                 struct {
> > > >                                         /* Always even, to negate PageTail */
> > > >                                         void *__filler;
> > > >                                         /* Count page's or folio's mlocks */
> > > >                                         unsigned int mlock_count;
> > > >                                 };
> > > >
> > > >                                 /* Or, free page */
> > > >                                 struct list_head buddy_list;
> > > >                                 struct list_head pcp_list;
> > > >                         };
> > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > >                         struct address_space *mapping;
> > > >                         union {
> > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > >                                 unsigned long share;    /* share count for fsdax */
> > > >                         };
> > > >                         /**
> > > >                          * @private: Mapping-private opaque data.
> > > >                          * Usually used for buffer_heads if PagePrivate.
> > > >                          * Used for swp_entry_t if PageSwapCache.
> > > >                          * Indicates order in the buddy system if PageBuddy.
> > > >                          */
> > > >                         unsigned long private;
> > > >                 };
> > > >
> > > > But within the page pool struct, we have a variable called
> > > > dma_addr that is appropriate for storing dma addr.
> > > > And that struct is used by netstack. That works to our advantage.
> > > >
> > > >                 struct {        /* page_pool used by netstack */
> > > >                         /**
> > > >                          * @pp_magic: magic value to avoid recycling non
> > > >                          * page_pool allocated pages.
> > > >                          */
> > > >                         unsigned long pp_magic;
> > > >                         struct page_pool *pp;
> > > >                         unsigned long _pp_mapping_pad;
> > > >                         unsigned long dma_addr;
> > > >                         atomic_long_t pp_ref_count;
> > > >                 };
> > > >
> > > > On the other side, we should use variables from the same sub-struct.
> > > > So this patch replaces the "private" with "pp".
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > >
> > > Instead of doing a customized version of page pool, can we simply
> > > switch to use page pool for big mode instead? Then we don't need to
> > > bother the dma stuffs.
> >
> >
> > The page pool needs to do the dma by the DMA APIs.
> > So we can not use the page pool directly.
>
> I found this:
>
> define PP_FLAG_DMA_MAP         BIT(0) /* Should page_pool do the DMA
>                                         * map/unmap

You are right. I missed this. I will try.

Thanks.


>
> It seems to work here?
>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > Thanks
> > >
> >
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 4/6] virtio_net: big mode support premapped
  2024-04-11  2:51 ` [PATCH vhost 4/6] virtio_net: big mode support premapped Xuan Zhuo
  2024-04-11 16:34   ` kernel test robot
  2024-04-11 20:11   ` kernel test robot
@ 2024-04-14  9:48   ` Dan Carpenter
  2024-04-18  6:25   ` Jason Wang
  3 siblings, 0 replies; 49+ messages in thread
From: Dan Carpenter @ 2024-04-14  9:48 UTC (permalink / raw)
  To: oe-kbuild, Xuan Zhuo, virtualization
  Cc: lkp, oe-kbuild-all, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

Hi Xuan,

kernel test robot noticed the following build warnings:

https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Xuan-Zhuo/virtio_ring-introduce-dma-map-api-for-page/20240411-105318
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
patch link:    https://lore.kernel.org/r/20240411025127.51945-5-xuanzhuo%40linux.alibaba.com
patch subject: [PATCH vhost 4/6] virtio_net: big mode support premapped
config: i386-randconfig-141-20240414 (https://download.01.org/0day-ci/archive/20240414/202404141343.iPhKo7zd-lkp@intel.com/config)
compiler: gcc-13 (Ubuntu 13.2.0-4ubuntu3) 13.2.0

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
| Closes: https://lore.kernel.org/r/202404141343.iPhKo7zd-lkp@intel.com/

New smatch warnings:
drivers/net/virtio_net.c:485 give_pages() warn: impossible condition '((page->dma_addr) == (~0)) => (0-u32max == u64max)'

vim +485 drivers/net/virtio_net.c

e9d7417b97f420 Jason Wang      2012-12-07  481  static void give_pages(struct receive_queue *rq, struct page *page)
0a888fd1f6320d Mark McLoughlin 2008-11-16  482  {
9ab86bbcf8be75 Shirley Ma      2010-01-29  483  	struct page *end;
0a888fd1f6320d Mark McLoughlin 2008-11-16  484  
59e4bcf761eeba Xuan Zhuo       2024-04-11 @485  	if (page_dma_addr(page) == DMA_MAPPING_ERROR) {

(struct page)->dma_addr is unsigned long but DMA_MAPPING_ERROR is
dma_addr_t.

59e4bcf761eeba Xuan Zhuo       2024-04-11  486  		if (page_chain_map(rq, page)) {
59e4bcf761eeba Xuan Zhuo       2024-04-11  487  			__free_pages(page, 0);
59e4bcf761eeba Xuan Zhuo       2024-04-11  488  			return;
59e4bcf761eeba Xuan Zhuo       2024-04-11  489  		}
59e4bcf761eeba Xuan Zhuo       2024-04-11  490  	}
59e4bcf761eeba Xuan Zhuo       2024-04-11  491  
e9d7417b97f420 Jason Wang      2012-12-07  492  	/* Find end of list, sew whole thing into vi->rq.pages. */
590f79cf558cb4 Xuan Zhuo       2024-04-11  493  	for (end = page; page_chain_next(end); end = page_chain_next(end));
590f79cf558cb4 Xuan Zhuo       2024-04-11  494  
590f79cf558cb4 Xuan Zhuo       2024-04-11  495  	page_chain_add(end, rq->pages);
e9d7417b97f420 Jason Wang      2012-12-07  496  	rq->pages = page;
0a888fd1f6320d Mark McLoughlin 2008-11-16  497  }

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page
  2024-04-12  5:49       ` Jason Wang
  2024-04-12  6:02         ` Xuan Zhuo
@ 2024-04-15  2:08         ` Xuan Zhuo
  2024-04-15  6:43           ` Jason Wang
  1 sibling, 1 reply; 49+ messages in thread
From: Xuan Zhuo @ 2024-04-15  2:08 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Fri, 12 Apr 2024 13:49:12 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Fri, Apr 12, 2024 at 1:39 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > Now, we chain the pages of big mode by the page's private variable.
> > > > But a subsequent patch aims to make the big mode to support
> > > > premapped mode. This requires additional space to store the dma addr.
> > > >
> > > > Within the sub-struct that contains the 'private', there is no suitable
> > > > variable for storing the DMA addr.
> > > >
> > > >                 struct {        /* Page cache and anonymous pages */
> > > >                         /**
> > > >                          * @lru: Pageout list, eg. active_list protected by
> > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > >                          * by the page owner.
> > > >                          */
> > > >                         union {
> > > >                                 struct list_head lru;
> > > >
> > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > >                                 struct {
> > > >                                         /* Always even, to negate PageTail */
> > > >                                         void *__filler;
> > > >                                         /* Count page's or folio's mlocks */
> > > >                                         unsigned int mlock_count;
> > > >                                 };
> > > >
> > > >                                 /* Or, free page */
> > > >                                 struct list_head buddy_list;
> > > >                                 struct list_head pcp_list;
> > > >                         };
> > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > >                         struct address_space *mapping;
> > > >                         union {
> > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > >                                 unsigned long share;    /* share count for fsdax */
> > > >                         };
> > > >                         /**
> > > >                          * @private: Mapping-private opaque data.
> > > >                          * Usually used for buffer_heads if PagePrivate.
> > > >                          * Used for swp_entry_t if PageSwapCache.
> > > >                          * Indicates order in the buddy system if PageBuddy.
> > > >                          */
> > > >                         unsigned long private;
> > > >                 };
> > > >
> > > > But within the page pool struct, we have a variable called
> > > > dma_addr that is appropriate for storing dma addr.
> > > > And that struct is used by netstack. That works to our advantage.
> > > >
> > > >                 struct {        /* page_pool used by netstack */
> > > >                         /**
> > > >                          * @pp_magic: magic value to avoid recycling non
> > > >                          * page_pool allocated pages.
> > > >                          */
> > > >                         unsigned long pp_magic;
> > > >                         struct page_pool *pp;
> > > >                         unsigned long _pp_mapping_pad;
> > > >                         unsigned long dma_addr;
> > > >                         atomic_long_t pp_ref_count;
> > > >                 };
> > > >
> > > > On the other side, we should use variables from the same sub-struct.
> > > > So this patch replaces the "private" with "pp".
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > >
> > > Instead of doing a customized version of page pool, can we simply
> > > switch to use page pool for big mode instead? Then we don't need to
> > > bother the dma stuffs.
> >
> >
> > The page pool needs to do the dma by the DMA APIs.
> > So we can not use the page pool directly.
>
> I found this:
>
> define PP_FLAG_DMA_MAP         BIT(0) /* Should page_pool do the DMA
>                                         * map/unmap
>
> It seems to work here?


I have studied the page pool mechanism and believe that we cannot use it
directly. We can make the page pool to bypass the DMA operations.
This allows us to handle DMA within virtio-net for pages allocated from the page
pool. Furthermore, we can utilize page pool helpers to associate the DMA address
to the page.

However, the critical issue pertains to unmapping. Ideally, we want to return
the mapped pages to the page pool and reuse them. In doing so, we can omit the
unmapping and remapping steps.

Currently, there's a caveat: when the page pool cache is full, it disconnects
and releases the pages. When the pool hits its capacity, pages are relinquished
without a chance for unmapping. If we were to unmap pages each time before
returning them to the pool, we would negate the benefits of bypassing the
mapping and unmapping process altogether.

Thanks.



>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > Thanks
> > >
> >
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page
  2024-04-15  2:08         ` Xuan Zhuo
@ 2024-04-15  6:43           ` Jason Wang
  2024-04-15  8:36             ` Xuan Zhuo
  0 siblings, 1 reply; 49+ messages in thread
From: Jason Wang @ 2024-04-15  6:43 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Mon, Apr 15, 2024 at 10:35 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Fri, 12 Apr 2024 13:49:12 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Fri, Apr 12, 2024 at 1:39 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > Now, we chain the pages of big mode by the page's private variable.
> > > > > But a subsequent patch aims to make the big mode to support
> > > > > premapped mode. This requires additional space to store the dma addr.
> > > > >
> > > > > Within the sub-struct that contains the 'private', there is no suitable
> > > > > variable for storing the DMA addr.
> > > > >
> > > > >                 struct {        /* Page cache and anonymous pages */
> > > > >                         /**
> > > > >                          * @lru: Pageout list, eg. active_list protected by
> > > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > > >                          * by the page owner.
> > > > >                          */
> > > > >                         union {
> > > > >                                 struct list_head lru;
> > > > >
> > > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > > >                                 struct {
> > > > >                                         /* Always even, to negate PageTail */
> > > > >                                         void *__filler;
> > > > >                                         /* Count page's or folio's mlocks */
> > > > >                                         unsigned int mlock_count;
> > > > >                                 };
> > > > >
> > > > >                                 /* Or, free page */
> > > > >                                 struct list_head buddy_list;
> > > > >                                 struct list_head pcp_list;
> > > > >                         };
> > > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > > >                         struct address_space *mapping;
> > > > >                         union {
> > > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > > >                                 unsigned long share;    /* share count for fsdax */
> > > > >                         };
> > > > >                         /**
> > > > >                          * @private: Mapping-private opaque data.
> > > > >                          * Usually used for buffer_heads if PagePrivate.
> > > > >                          * Used for swp_entry_t if PageSwapCache.
> > > > >                          * Indicates order in the buddy system if PageBuddy.
> > > > >                          */
> > > > >                         unsigned long private;
> > > > >                 };
> > > > >
> > > > > But within the page pool struct, we have a variable called
> > > > > dma_addr that is appropriate for storing dma addr.
> > > > > And that struct is used by netstack. That works to our advantage.
> > > > >
> > > > >                 struct {        /* page_pool used by netstack */
> > > > >                         /**
> > > > >                          * @pp_magic: magic value to avoid recycling non
> > > > >                          * page_pool allocated pages.
> > > > >                          */
> > > > >                         unsigned long pp_magic;
> > > > >                         struct page_pool *pp;
> > > > >                         unsigned long _pp_mapping_pad;
> > > > >                         unsigned long dma_addr;
> > > > >                         atomic_long_t pp_ref_count;
> > > > >                 };
> > > > >
> > > > > On the other side, we should use variables from the same sub-struct.
> > > > > So this patch replaces the "private" with "pp".
> > > > >
> > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > ---
> > > >
> > > > Instead of doing a customized version of page pool, can we simply
> > > > switch to use page pool for big mode instead? Then we don't need to
> > > > bother the dma stuffs.
> > >
> > >
> > > The page pool needs to do the dma by the DMA APIs.
> > > So we can not use the page pool directly.
> >
> > I found this:
> >
> > define PP_FLAG_DMA_MAP         BIT(0) /* Should page_pool do the DMA
> >                                         * map/unmap
> >
> > It seems to work here?
>
>
> I have studied the page pool mechanism and believe that we cannot use it
> directly. We can make the page pool to bypass the DMA operations.
> This allows us to handle DMA within virtio-net for pages allocated from the page
> pool. Furthermore, we can utilize page pool helpers to associate the DMA address
> to the page.
>
> However, the critical issue pertains to unmapping. Ideally, we want to return
> the mapped pages to the page pool and reuse them. In doing so, we can omit the
> unmapping and remapping steps.
>
> Currently, there's a caveat: when the page pool cache is full, it disconnects
> and releases the pages. When the pool hits its capacity, pages are relinquished
> without a chance for unmapping.

Technically, when ptr_ring is full there could be a fallback, but then
it requires expensive synchronization between producer and consumer.
For virtio-net, it might not be a problem because add/get has been
synchronized. (It might be relaxed in the future, actually we've
already seen a requirement in the past for virito-blk).

> If we were to unmap pages each time before
> returning them to the pool, we would negate the benefits of bypassing the
> mapping and unmapping process altogether.

Yes, but the problem in this approach is that it creates a corner
exception where dma_addr is used outside the page pool.

Maybe for big mode it doesn't matter too much if there's no
performance improvement.

Thanks

>
> Thanks.
>
>
>
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > >
> >
>


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page
  2024-04-15  6:43           ` Jason Wang
@ 2024-04-15  8:36             ` Xuan Zhuo
  2024-04-15  8:56               ` Jason Wang
  0 siblings, 1 reply; 49+ messages in thread
From: Xuan Zhuo @ 2024-04-15  8:36 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Mon, 15 Apr 2024 14:43:24 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Apr 15, 2024 at 10:35 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Fri, 12 Apr 2024 13:49:12 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Fri, Apr 12, 2024 at 1:39 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > Now, we chain the pages of big mode by the page's private variable.
> > > > > > But a subsequent patch aims to make the big mode to support
> > > > > > premapped mode. This requires additional space to store the dma addr.
> > > > > >
> > > > > > Within the sub-struct that contains the 'private', there is no suitable
> > > > > > variable for storing the DMA addr.
> > > > > >
> > > > > >                 struct {        /* Page cache and anonymous pages */
> > > > > >                         /**
> > > > > >                          * @lru: Pageout list, eg. active_list protected by
> > > > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > > > >                          * by the page owner.
> > > > > >                          */
> > > > > >                         union {
> > > > > >                                 struct list_head lru;
> > > > > >
> > > > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > > > >                                 struct {
> > > > > >                                         /* Always even, to negate PageTail */
> > > > > >                                         void *__filler;
> > > > > >                                         /* Count page's or folio's mlocks */
> > > > > >                                         unsigned int mlock_count;
> > > > > >                                 };
> > > > > >
> > > > > >                                 /* Or, free page */
> > > > > >                                 struct list_head buddy_list;
> > > > > >                                 struct list_head pcp_list;
> > > > > >                         };
> > > > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > > > >                         struct address_space *mapping;
> > > > > >                         union {
> > > > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > > > >                                 unsigned long share;    /* share count for fsdax */
> > > > > >                         };
> > > > > >                         /**
> > > > > >                          * @private: Mapping-private opaque data.
> > > > > >                          * Usually used for buffer_heads if PagePrivate.
> > > > > >                          * Used for swp_entry_t if PageSwapCache.
> > > > > >                          * Indicates order in the buddy system if PageBuddy.
> > > > > >                          */
> > > > > >                         unsigned long private;
> > > > > >                 };
> > > > > >
> > > > > > But within the page pool struct, we have a variable called
> > > > > > dma_addr that is appropriate for storing dma addr.
> > > > > > And that struct is used by netstack. That works to our advantage.
> > > > > >
> > > > > >                 struct {        /* page_pool used by netstack */
> > > > > >                         /**
> > > > > >                          * @pp_magic: magic value to avoid recycling non
> > > > > >                          * page_pool allocated pages.
> > > > > >                          */
> > > > > >                         unsigned long pp_magic;
> > > > > >                         struct page_pool *pp;
> > > > > >                         unsigned long _pp_mapping_pad;
> > > > > >                         unsigned long dma_addr;
> > > > > >                         atomic_long_t pp_ref_count;
> > > > > >                 };
> > > > > >
> > > > > > On the other side, we should use variables from the same sub-struct.
> > > > > > So this patch replaces the "private" with "pp".
> > > > > >
> > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > ---
> > > > >
> > > > > Instead of doing a customized version of page pool, can we simply
> > > > > switch to use page pool for big mode instead? Then we don't need to
> > > > > bother the dma stuffs.
> > > >
> > > >
> > > > The page pool needs to do the dma by the DMA APIs.
> > > > So we can not use the page pool directly.
> > >
> > > I found this:
> > >
> > > define PP_FLAG_DMA_MAP         BIT(0) /* Should page_pool do the DMA
> > >                                         * map/unmap
> > >
> > > It seems to work here?
> >
> >
> > I have studied the page pool mechanism and believe that we cannot use it
> > directly. We can make the page pool to bypass the DMA operations.
> > This allows us to handle DMA within virtio-net for pages allocated from the page
> > pool. Furthermore, we can utilize page pool helpers to associate the DMA address
> > to the page.
> >
> > However, the critical issue pertains to unmapping. Ideally, we want to return
> > the mapped pages to the page pool and reuse them. In doing so, we can omit the
> > unmapping and remapping steps.
> >
> > Currently, there's a caveat: when the page pool cache is full, it disconnects
> > and releases the pages. When the pool hits its capacity, pages are relinquished
> > without a chance for unmapping.
>
> Technically, when ptr_ring is full there could be a fallback, but then
> it requires expensive synchronization between producer and consumer.
> For virtio-net, it might not be a problem because add/get has been
> synchronized. (It might be relaxed in the future, actually we've
> already seen a requirement in the past for virito-blk).

The point is that the page will be released by page pool directly,
we will have no change to unmap that, if we work with page pool.

>
> > If we were to unmap pages each time before
> > returning them to the pool, we would negate the benefits of bypassing the
> > mapping and unmapping process altogether.
>
> Yes, but the problem in this approach is that it creates a corner
> exception where dma_addr is used outside the page pool.

YES. This is a corner exception. We need to introduce this case to the page
pool.

So for introducing the page-pool to virtio-net(not only for big mode),
we may need to push the page-pool to support dma by drivers.

Back to this patch set, I think we should keep the virtio-net to manage
the pages.

What do you think?

Thanks

>
> Maybe for big mode it doesn't matter too much if there's no
> performance improvement.
>
> Thanks
>
> >
> > Thanks.
> >
> >
> >
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > >
> >
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page
  2024-04-15  8:36             ` Xuan Zhuo
@ 2024-04-15  8:56               ` Jason Wang
  2024-04-15  8:59                 ` Xuan Zhuo
  0 siblings, 1 reply; 49+ messages in thread
From: Jason Wang @ 2024-04-15  8:56 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev,
	Jesper Dangaard Brouer

On Mon, Apr 15, 2024 at 4:50 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Mon, 15 Apr 2024 14:43:24 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Mon, Apr 15, 2024 at 10:35 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Fri, 12 Apr 2024 13:49:12 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Fri, Apr 12, 2024 at 1:39 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > Now, we chain the pages of big mode by the page's private variable.
> > > > > > > But a subsequent patch aims to make the big mode to support
> > > > > > > premapped mode. This requires additional space to store the dma addr.
> > > > > > >
> > > > > > > Within the sub-struct that contains the 'private', there is no suitable
> > > > > > > variable for storing the DMA addr.
> > > > > > >
> > > > > > >                 struct {        /* Page cache and anonymous pages */
> > > > > > >                         /**
> > > > > > >                          * @lru: Pageout list, eg. active_list protected by
> > > > > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > > > > >                          * by the page owner.
> > > > > > >                          */
> > > > > > >                         union {
> > > > > > >                                 struct list_head lru;
> > > > > > >
> > > > > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > > > > >                                 struct {
> > > > > > >                                         /* Always even, to negate PageTail */
> > > > > > >                                         void *__filler;
> > > > > > >                                         /* Count page's or folio's mlocks */
> > > > > > >                                         unsigned int mlock_count;
> > > > > > >                                 };
> > > > > > >
> > > > > > >                                 /* Or, free page */
> > > > > > >                                 struct list_head buddy_list;
> > > > > > >                                 struct list_head pcp_list;
> > > > > > >                         };
> > > > > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > > > > >                         struct address_space *mapping;
> > > > > > >                         union {
> > > > > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > > > > >                                 unsigned long share;    /* share count for fsdax */
> > > > > > >                         };
> > > > > > >                         /**
> > > > > > >                          * @private: Mapping-private opaque data.
> > > > > > >                          * Usually used for buffer_heads if PagePrivate.
> > > > > > >                          * Used for swp_entry_t if PageSwapCache.
> > > > > > >                          * Indicates order in the buddy system if PageBuddy.
> > > > > > >                          */
> > > > > > >                         unsigned long private;
> > > > > > >                 };
> > > > > > >
> > > > > > > But within the page pool struct, we have a variable called
> > > > > > > dma_addr that is appropriate for storing dma addr.
> > > > > > > And that struct is used by netstack. That works to our advantage.
> > > > > > >
> > > > > > >                 struct {        /* page_pool used by netstack */
> > > > > > >                         /**
> > > > > > >                          * @pp_magic: magic value to avoid recycling non
> > > > > > >                          * page_pool allocated pages.
> > > > > > >                          */
> > > > > > >                         unsigned long pp_magic;
> > > > > > >                         struct page_pool *pp;
> > > > > > >                         unsigned long _pp_mapping_pad;
> > > > > > >                         unsigned long dma_addr;
> > > > > > >                         atomic_long_t pp_ref_count;
> > > > > > >                 };
> > > > > > >
> > > > > > > On the other side, we should use variables from the same sub-struct.
> > > > > > > So this patch replaces the "private" with "pp".
> > > > > > >
> > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > ---
> > > > > >
> > > > > > Instead of doing a customized version of page pool, can we simply
> > > > > > switch to use page pool for big mode instead? Then we don't need to
> > > > > > bother the dma stuffs.
> > > > >
> > > > >
> > > > > The page pool needs to do the dma by the DMA APIs.
> > > > > So we can not use the page pool directly.
> > > >
> > > > I found this:
> > > >
> > > > define PP_FLAG_DMA_MAP         BIT(0) /* Should page_pool do the DMA
> > > >                                         * map/unmap
> > > >
> > > > It seems to work here?
> > >
> > >
> > > I have studied the page pool mechanism and believe that we cannot use it
> > > directly. We can make the page pool to bypass the DMA operations.
> > > This allows us to handle DMA within virtio-net for pages allocated from the page
> > > pool. Furthermore, we can utilize page pool helpers to associate the DMA address
> > > to the page.
> > >
> > > However, the critical issue pertains to unmapping. Ideally, we want to return
> > > the mapped pages to the page pool and reuse them. In doing so, we can omit the
> > > unmapping and remapping steps.
> > >
> > > Currently, there's a caveat: when the page pool cache is full, it disconnects
> > > and releases the pages. When the pool hits its capacity, pages are relinquished
> > > without a chance for unmapping.
> >
> > Technically, when ptr_ring is full there could be a fallback, but then
> > it requires expensive synchronization between producer and consumer.
> > For virtio-net, it might not be a problem because add/get has been
> > synchronized. (It might be relaxed in the future, actually we've
> > already seen a requirement in the past for virito-blk).
>
> The point is that the page will be released by page pool directly,
> we will have no change to unmap that, if we work with page pool.

I mean if we have a fallback, there would be no need to release these
pages but put them into a link list.

>
> >
> > > If we were to unmap pages each time before
> > > returning them to the pool, we would negate the benefits of bypassing the
> > > mapping and unmapping process altogether.
> >
> > Yes, but the problem in this approach is that it creates a corner
> > exception where dma_addr is used outside the page pool.
>
> YES. This is a corner exception. We need to introduce this case to the page
> pool.
>
> So for introducing the page-pool to virtio-net(not only for big mode),
> we may need to push the page-pool to support dma by drivers.

Adding Jesper for some comments.

>
> Back to this patch set, I think we should keep the virtio-net to manage
> the pages.
>
> What do you think?

I might be wrong, but I think if we need to either

1) seek a way to manage the pages by yourself but not touching page
pool metadata (or Jesper is fine with this)
2) optimize the unmap for page pool

or even

3) just do dma_unmap before returning the page back to the page pool,
we don't get all the benefits of page pool but we end up with simple
codes (no fallback for premapping).

Thanks


>
> Thanks
>
> >
> > Maybe for big mode it doesn't matter too much if there's no
> > performance improvement.
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > >
> > > >
> > >
> >
>


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page
  2024-04-15  8:56               ` Jason Wang
@ 2024-04-15  8:59                 ` Xuan Zhuo
  2024-04-16  3:24                   ` Jason Wang
  0 siblings, 1 reply; 49+ messages in thread
From: Xuan Zhuo @ 2024-04-15  8:59 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev,
	Jesper Dangaard Brouer

On Mon, 15 Apr 2024 16:56:45 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Apr 15, 2024 at 4:50 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Mon, 15 Apr 2024 14:43:24 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Mon, Apr 15, 2024 at 10:35 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Fri, 12 Apr 2024 13:49:12 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Fri, Apr 12, 2024 at 1:39 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > Now, we chain the pages of big mode by the page's private variable.
> > > > > > > > But a subsequent patch aims to make the big mode to support
> > > > > > > > premapped mode. This requires additional space to store the dma addr.
> > > > > > > >
> > > > > > > > Within the sub-struct that contains the 'private', there is no suitable
> > > > > > > > variable for storing the DMA addr.
> > > > > > > >
> > > > > > > >                 struct {        /* Page cache and anonymous pages */
> > > > > > > >                         /**
> > > > > > > >                          * @lru: Pageout list, eg. active_list protected by
> > > > > > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > > > > > >                          * by the page owner.
> > > > > > > >                          */
> > > > > > > >                         union {
> > > > > > > >                                 struct list_head lru;
> > > > > > > >
> > > > > > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > > > > > >                                 struct {
> > > > > > > >                                         /* Always even, to negate PageTail */
> > > > > > > >                                         void *__filler;
> > > > > > > >                                         /* Count page's or folio's mlocks */
> > > > > > > >                                         unsigned int mlock_count;
> > > > > > > >                                 };
> > > > > > > >
> > > > > > > >                                 /* Or, free page */
> > > > > > > >                                 struct list_head buddy_list;
> > > > > > > >                                 struct list_head pcp_list;
> > > > > > > >                         };
> > > > > > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > > > > > >                         struct address_space *mapping;
> > > > > > > >                         union {
> > > > > > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > > > > > >                                 unsigned long share;    /* share count for fsdax */
> > > > > > > >                         };
> > > > > > > >                         /**
> > > > > > > >                          * @private: Mapping-private opaque data.
> > > > > > > >                          * Usually used for buffer_heads if PagePrivate.
> > > > > > > >                          * Used for swp_entry_t if PageSwapCache.
> > > > > > > >                          * Indicates order in the buddy system if PageBuddy.
> > > > > > > >                          */
> > > > > > > >                         unsigned long private;
> > > > > > > >                 };
> > > > > > > >
> > > > > > > > But within the page pool struct, we have a variable called
> > > > > > > > dma_addr that is appropriate for storing dma addr.
> > > > > > > > And that struct is used by netstack. That works to our advantage.
> > > > > > > >
> > > > > > > >                 struct {        /* page_pool used by netstack */
> > > > > > > >                         /**
> > > > > > > >                          * @pp_magic: magic value to avoid recycling non
> > > > > > > >                          * page_pool allocated pages.
> > > > > > > >                          */
> > > > > > > >                         unsigned long pp_magic;
> > > > > > > >                         struct page_pool *pp;
> > > > > > > >                         unsigned long _pp_mapping_pad;
> > > > > > > >                         unsigned long dma_addr;
> > > > > > > >                         atomic_long_t pp_ref_count;
> > > > > > > >                 };
> > > > > > > >
> > > > > > > > On the other side, we should use variables from the same sub-struct.
> > > > > > > > So this patch replaces the "private" with "pp".
> > > > > > > >
> > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > ---
> > > > > > >
> > > > > > > Instead of doing a customized version of page pool, can we simply
> > > > > > > switch to use page pool for big mode instead? Then we don't need to
> > > > > > > bother the dma stuffs.
> > > > > >
> > > > > >
> > > > > > The page pool needs to do the dma by the DMA APIs.
> > > > > > So we can not use the page pool directly.
> > > > >
> > > > > I found this:
> > > > >
> > > > > define PP_FLAG_DMA_MAP         BIT(0) /* Should page_pool do the DMA
> > > > >                                         * map/unmap
> > > > >
> > > > > It seems to work here?
> > > >
> > > >
> > > > I have studied the page pool mechanism and believe that we cannot use it
> > > > directly. We can make the page pool to bypass the DMA operations.
> > > > This allows us to handle DMA within virtio-net for pages allocated from the page
> > > > pool. Furthermore, we can utilize page pool helpers to associate the DMA address
> > > > to the page.
> > > >
> > > > However, the critical issue pertains to unmapping. Ideally, we want to return
> > > > the mapped pages to the page pool and reuse them. In doing so, we can omit the
> > > > unmapping and remapping steps.
> > > >
> > > > Currently, there's a caveat: when the page pool cache is full, it disconnects
> > > > and releases the pages. When the pool hits its capacity, pages are relinquished
> > > > without a chance for unmapping.
> > >
> > > Technically, when ptr_ring is full there could be a fallback, but then
> > > it requires expensive synchronization between producer and consumer.
> > > For virtio-net, it might not be a problem because add/get has been
> > > synchronized. (It might be relaxed in the future, actually we've
> > > already seen a requirement in the past for virito-blk).
> >
> > The point is that the page will be released by page pool directly,
> > we will have no change to unmap that, if we work with page pool.
>
> I mean if we have a fallback, there would be no need to release these
> pages but put them into a link list.


What fallback?

If we put the pages to the link list, why we use the page pool?


>
> >
> > >
> > > > If we were to unmap pages each time before
> > > > returning them to the pool, we would negate the benefits of bypassing the
> > > > mapping and unmapping process altogether.
> > >
> > > Yes, but the problem in this approach is that it creates a corner
> > > exception where dma_addr is used outside the page pool.
> >
> > YES. This is a corner exception. We need to introduce this case to the page
> > pool.
> >
> > So for introducing the page-pool to virtio-net(not only for big mode),
> > we may need to push the page-pool to support dma by drivers.
>
> Adding Jesper for some comments.
>
> >
> > Back to this patch set, I think we should keep the virtio-net to manage
> > the pages.
> >
> > What do you think?
>
> I might be wrong, but I think if we need to either
>
> 1) seek a way to manage the pages by yourself but not touching page
> pool metadata (or Jesper is fine with this)

Do you mean working with page pool or not?

If we manage the pages by self(no page pool), we do not care the metadata is for
page pool or not. We just use the space of pages like the "private".


> 2) optimize the unmap for page pool
>
> or even
>
> 3) just do dma_unmap before returning the page back to the page pool,
> we don't get all the benefits of page pool but we end up with simple
> codes (no fallback for premapping).

I am ok for this.


Thanks.

>
> Thanks
>
>
> >
> > Thanks
> >
> > >
> > > Maybe for big mode it doesn't matter too much if there's no
> > > performance improvement.
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks.
> > > >
> > > >
> > > >
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page
  2024-04-15  8:59                 ` Xuan Zhuo
@ 2024-04-16  3:24                   ` Jason Wang
  2024-04-17  1:30                     ` Xuan Zhuo
  0 siblings, 1 reply; 49+ messages in thread
From: Jason Wang @ 2024-04-16  3:24 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev,
	Jesper Dangaard Brouer

On Mon, Apr 15, 2024 at 5:04 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Mon, 15 Apr 2024 16:56:45 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Mon, Apr 15, 2024 at 4:50 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Mon, 15 Apr 2024 14:43:24 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Mon, Apr 15, 2024 at 10:35 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Fri, 12 Apr 2024 13:49:12 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Fri, Apr 12, 2024 at 1:39 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > >
> > > > > > > > > Now, we chain the pages of big mode by the page's private variable.
> > > > > > > > > But a subsequent patch aims to make the big mode to support
> > > > > > > > > premapped mode. This requires additional space to store the dma addr.
> > > > > > > > >
> > > > > > > > > Within the sub-struct that contains the 'private', there is no suitable
> > > > > > > > > variable for storing the DMA addr.
> > > > > > > > >
> > > > > > > > >                 struct {        /* Page cache and anonymous pages */
> > > > > > > > >                         /**
> > > > > > > > >                          * @lru: Pageout list, eg. active_list protected by
> > > > > > > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > > > > > > >                          * by the page owner.
> > > > > > > > >                          */
> > > > > > > > >                         union {
> > > > > > > > >                                 struct list_head lru;
> > > > > > > > >
> > > > > > > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > > > > > > >                                 struct {
> > > > > > > > >                                         /* Always even, to negate PageTail */
> > > > > > > > >                                         void *__filler;
> > > > > > > > >                                         /* Count page's or folio's mlocks */
> > > > > > > > >                                         unsigned int mlock_count;
> > > > > > > > >                                 };
> > > > > > > > >
> > > > > > > > >                                 /* Or, free page */
> > > > > > > > >                                 struct list_head buddy_list;
> > > > > > > > >                                 struct list_head pcp_list;
> > > > > > > > >                         };
> > > > > > > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > > > > > > >                         struct address_space *mapping;
> > > > > > > > >                         union {
> > > > > > > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > > > > > > >                                 unsigned long share;    /* share count for fsdax */
> > > > > > > > >                         };
> > > > > > > > >                         /**
> > > > > > > > >                          * @private: Mapping-private opaque data.
> > > > > > > > >                          * Usually used for buffer_heads if PagePrivate.
> > > > > > > > >                          * Used for swp_entry_t if PageSwapCache.
> > > > > > > > >                          * Indicates order in the buddy system if PageBuddy.
> > > > > > > > >                          */
> > > > > > > > >                         unsigned long private;
> > > > > > > > >                 };
> > > > > > > > >
> > > > > > > > > But within the page pool struct, we have a variable called
> > > > > > > > > dma_addr that is appropriate for storing dma addr.
> > > > > > > > > And that struct is used by netstack. That works to our advantage.
> > > > > > > > >
> > > > > > > > >                 struct {        /* page_pool used by netstack */
> > > > > > > > >                         /**
> > > > > > > > >                          * @pp_magic: magic value to avoid recycling non
> > > > > > > > >                          * page_pool allocated pages.
> > > > > > > > >                          */
> > > > > > > > >                         unsigned long pp_magic;
> > > > > > > > >                         struct page_pool *pp;
> > > > > > > > >                         unsigned long _pp_mapping_pad;
> > > > > > > > >                         unsigned long dma_addr;
> > > > > > > > >                         atomic_long_t pp_ref_count;
> > > > > > > > >                 };
> > > > > > > > >
> > > > > > > > > On the other side, we should use variables from the same sub-struct.
> > > > > > > > > So this patch replaces the "private" with "pp".
> > > > > > > > >
> > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > ---
> > > > > > > >
> > > > > > > > Instead of doing a customized version of page pool, can we simply
> > > > > > > > switch to use page pool for big mode instead? Then we don't need to
> > > > > > > > bother the dma stuffs.
> > > > > > >
> > > > > > >
> > > > > > > The page pool needs to do the dma by the DMA APIs.
> > > > > > > So we can not use the page pool directly.
> > > > > >
> > > > > > I found this:
> > > > > >
> > > > > > define PP_FLAG_DMA_MAP         BIT(0) /* Should page_pool do the DMA
> > > > > >                                         * map/unmap
> > > > > >
> > > > > > It seems to work here?
> > > > >
> > > > >
> > > > > I have studied the page pool mechanism and believe that we cannot use it
> > > > > directly. We can make the page pool to bypass the DMA operations.
> > > > > This allows us to handle DMA within virtio-net for pages allocated from the page
> > > > > pool. Furthermore, we can utilize page pool helpers to associate the DMA address
> > > > > to the page.
> > > > >
> > > > > However, the critical issue pertains to unmapping. Ideally, we want to return
> > > > > the mapped pages to the page pool and reuse them. In doing so, we can omit the
> > > > > unmapping and remapping steps.
> > > > >
> > > > > Currently, there's a caveat: when the page pool cache is full, it disconnects
> > > > > and releases the pages. When the pool hits its capacity, pages are relinquished
> > > > > without a chance for unmapping.
> > > >
> > > > Technically, when ptr_ring is full there could be a fallback, but then
> > > > it requires expensive synchronization between producer and consumer.
> > > > For virtio-net, it might not be a problem because add/get has been
> > > > synchronized. (It might be relaxed in the future, actually we've
> > > > already seen a requirement in the past for virito-blk).
> > >
> > > The point is that the page will be released by page pool directly,
> > > we will have no change to unmap that, if we work with page pool.
> >
> > I mean if we have a fallback, there would be no need to release these
> > pages but put them into a link list.
>
>
> What fallback?

https://lore.kernel.org/netdev/1519607771-20613-1-git-send-email-mst@redhat.com/

>
> If we put the pages to the link list, why we use the page pool?

The size of the cache and ptr_ring needs to be fixed.

Again, as explained above, it needs more benchmarks and looks like a
separate topic.

>
>
> >
> > >
> > > >
> > > > > If we were to unmap pages each time before
> > > > > returning them to the pool, we would negate the benefits of bypassing the
> > > > > mapping and unmapping process altogether.
> > > >
> > > > Yes, but the problem in this approach is that it creates a corner
> > > > exception where dma_addr is used outside the page pool.
> > >
> > > YES. This is a corner exception. We need to introduce this case to the page
> > > pool.
> > >
> > > So for introducing the page-pool to virtio-net(not only for big mode),
> > > we may need to push the page-pool to support dma by drivers.
> >
> > Adding Jesper for some comments.
> >
> > >
> > > Back to this patch set, I think we should keep the virtio-net to manage
> > > the pages.
> > >
> > > What do you think?
> >
> > I might be wrong, but I think if we need to either
> >
> > 1) seek a way to manage the pages by yourself but not touching page
> > pool metadata (or Jesper is fine with this)
>
> Do you mean working with page pool or not?
>

I meant if Jesper is fine with reusing page pool metadata like this patch.

> If we manage the pages by self(no page pool), we do not care the metadata is for
> page pool or not. We just use the space of pages like the "private".

That's also fine.

>
>
> > 2) optimize the unmap for page pool
> >
> > or even
> >
> > 3) just do dma_unmap before returning the page back to the page pool,
> > we don't get all the benefits of page pool but we end up with simple
> > codes (no fallback for premapping).
>
> I am ok for this.

Right, we just need to make sure there's no performance regression,
then it would be fine.

I see for example mana did this as well.

Thanks

>
>
> Thanks.
>
> >
> > Thanks
> >
> >
> > >
> > > Thanks
> > >
> > > >
> > > > Maybe for big mode it doesn't matter too much if there's no
> > > > performance improvement.
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page
  2024-04-16  3:24                   ` Jason Wang
@ 2024-04-17  1:30                     ` Xuan Zhuo
  2024-04-17  4:08                       ` Jason Wang
  0 siblings, 1 reply; 49+ messages in thread
From: Xuan Zhuo @ 2024-04-17  1:30 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev,
	Jesper Dangaard Brouer

On Tue, 16 Apr 2024 11:24:53 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Apr 15, 2024 at 5:04 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Mon, 15 Apr 2024 16:56:45 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Mon, Apr 15, 2024 at 4:50 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Mon, 15 Apr 2024 14:43:24 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Mon, Apr 15, 2024 at 10:35 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Fri, 12 Apr 2024 13:49:12 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Fri, Apr 12, 2024 at 1:39 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Now, we chain the pages of big mode by the page's private variable.
> > > > > > > > > > But a subsequent patch aims to make the big mode to support
> > > > > > > > > > premapped mode. This requires additional space to store the dma addr.
> > > > > > > > > >
> > > > > > > > > > Within the sub-struct that contains the 'private', there is no suitable
> > > > > > > > > > variable for storing the DMA addr.
> > > > > > > > > >
> > > > > > > > > >                 struct {        /* Page cache and anonymous pages */
> > > > > > > > > >                         /**
> > > > > > > > > >                          * @lru: Pageout list, eg. active_list protected by
> > > > > > > > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > > > > > > > >                          * by the page owner.
> > > > > > > > > >                          */
> > > > > > > > > >                         union {
> > > > > > > > > >                                 struct list_head lru;
> > > > > > > > > >
> > > > > > > > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > > > > > > > >                                 struct {
> > > > > > > > > >                                         /* Always even, to negate PageTail */
> > > > > > > > > >                                         void *__filler;
> > > > > > > > > >                                         /* Count page's or folio's mlocks */
> > > > > > > > > >                                         unsigned int mlock_count;
> > > > > > > > > >                                 };
> > > > > > > > > >
> > > > > > > > > >                                 /* Or, free page */
> > > > > > > > > >                                 struct list_head buddy_list;
> > > > > > > > > >                                 struct list_head pcp_list;
> > > > > > > > > >                         };
> > > > > > > > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > > > > > > > >                         struct address_space *mapping;
> > > > > > > > > >                         union {
> > > > > > > > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > > > > > > > >                                 unsigned long share;    /* share count for fsdax */
> > > > > > > > > >                         };
> > > > > > > > > >                         /**
> > > > > > > > > >                          * @private: Mapping-private opaque data.
> > > > > > > > > >                          * Usually used for buffer_heads if PagePrivate.
> > > > > > > > > >                          * Used for swp_entry_t if PageSwapCache.
> > > > > > > > > >                          * Indicates order in the buddy system if PageBuddy.
> > > > > > > > > >                          */
> > > > > > > > > >                         unsigned long private;
> > > > > > > > > >                 };
> > > > > > > > > >
> > > > > > > > > > But within the page pool struct, we have a variable called
> > > > > > > > > > dma_addr that is appropriate for storing dma addr.
> > > > > > > > > > And that struct is used by netstack. That works to our advantage.
> > > > > > > > > >
> > > > > > > > > >                 struct {        /* page_pool used by netstack */
> > > > > > > > > >                         /**
> > > > > > > > > >                          * @pp_magic: magic value to avoid recycling non
> > > > > > > > > >                          * page_pool allocated pages.
> > > > > > > > > >                          */
> > > > > > > > > >                         unsigned long pp_magic;
> > > > > > > > > >                         struct page_pool *pp;
> > > > > > > > > >                         unsigned long _pp_mapping_pad;
> > > > > > > > > >                         unsigned long dma_addr;
> > > > > > > > > >                         atomic_long_t pp_ref_count;
> > > > > > > > > >                 };
> > > > > > > > > >
> > > > > > > > > > On the other side, we should use variables from the same sub-struct.
> > > > > > > > > > So this patch replaces the "private" with "pp".
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > > ---
> > > > > > > > >
> > > > > > > > > Instead of doing a customized version of page pool, can we simply
> > > > > > > > > switch to use page pool for big mode instead? Then we don't need to
> > > > > > > > > bother the dma stuffs.
> > > > > > > >
> > > > > > > >
> > > > > > > > The page pool needs to do the dma by the DMA APIs.
> > > > > > > > So we can not use the page pool directly.
> > > > > > >
> > > > > > > I found this:
> > > > > > >
> > > > > > > define PP_FLAG_DMA_MAP         BIT(0) /* Should page_pool do the DMA
> > > > > > >                                         * map/unmap
> > > > > > >
> > > > > > > It seems to work here?
> > > > > >
> > > > > >
> > > > > > I have studied the page pool mechanism and believe that we cannot use it
> > > > > > directly. We can make the page pool to bypass the DMA operations.
> > > > > > This allows us to handle DMA within virtio-net for pages allocated from the page
> > > > > > pool. Furthermore, we can utilize page pool helpers to associate the DMA address
> > > > > > to the page.
> > > > > >
> > > > > > However, the critical issue pertains to unmapping. Ideally, we want to return
> > > > > > the mapped pages to the page pool and reuse them. In doing so, we can omit the
> > > > > > unmapping and remapping steps.
> > > > > >
> > > > > > Currently, there's a caveat: when the page pool cache is full, it disconnects
> > > > > > and releases the pages. When the pool hits its capacity, pages are relinquished
> > > > > > without a chance for unmapping.
> > > > >
> > > > > Technically, when ptr_ring is full there could be a fallback, but then
> > > > > it requires expensive synchronization between producer and consumer.
> > > > > For virtio-net, it might not be a problem because add/get has been
> > > > > synchronized. (It might be relaxed in the future, actually we've
> > > > > already seen a requirement in the past for virito-blk).
> > > >
> > > > The point is that the page will be released by page pool directly,
> > > > we will have no change to unmap that, if we work with page pool.
> > >
> > > I mean if we have a fallback, there would be no need to release these
> > > pages but put them into a link list.
> >
> >
> > What fallback?
>
> https://lore.kernel.org/netdev/1519607771-20613-1-git-send-email-mst@redhat.com/
>
> >
> > If we put the pages to the link list, why we use the page pool?
>
> The size of the cache and ptr_ring needs to be fixed.
>
> Again, as explained above, it needs more benchmarks and looks like a
> separate topic.
>
> >
> >
> > >
> > > >
> > > > >
> > > > > > If we were to unmap pages each time before
> > > > > > returning them to the pool, we would negate the benefits of bypassing the
> > > > > > mapping and unmapping process altogether.
> > > > >
> > > > > Yes, but the problem in this approach is that it creates a corner
> > > > > exception where dma_addr is used outside the page pool.
> > > >
> > > > YES. This is a corner exception. We need to introduce this case to the page
> > > > pool.
> > > >
> > > > So for introducing the page-pool to virtio-net(not only for big mode),
> > > > we may need to push the page-pool to support dma by drivers.
> > >
> > > Adding Jesper for some comments.
> > >
> > > >
> > > > Back to this patch set, I think we should keep the virtio-net to manage
> > > > the pages.
> > > >
> > > > What do you think?
> > >
> > > I might be wrong, but I think if we need to either
> > >
> > > 1) seek a way to manage the pages by yourself but not touching page
> > > pool metadata (or Jesper is fine with this)
> >
> > Do you mean working with page pool or not?
> >
>
> I meant if Jesper is fine with reusing page pool metadata like this patch.
>
> > If we manage the pages by self(no page pool), we do not care the metadata is for
> > page pool or not. We just use the space of pages like the "private".
>
> That's also fine.
>
> >
> >
> > > 2) optimize the unmap for page pool
> > >
> > > or even
> > >
> > > 3) just do dma_unmap before returning the page back to the page pool,
> > > we don't get all the benefits of page pool but we end up with simple
> > > codes (no fallback for premapping).
> >
> > I am ok for this.
>
> Right, we just need to make sure there's no performance regression,
> then it would be fine.
>
> I see for example mana did this as well.

I think we should not use page pool directly now,
because the mana does not need a space to store the dma address.
We need to store the dma address for unmapping.

If we use page pool without PP_FLAG_DMA_MAP, then store the dma address by
page.dma_addr, I think that is not safe.

I think the way of this patch set is fine. We just use the
space of the page whatever it is page pool or not to store
the link and dma address.

Thanks.

>
> Thanks
>
> >
> >
> > Thanks.
> >
> > >
> > > Thanks
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > Maybe for big mode it doesn't matter too much if there's no
> > > > > performance improvement.
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page
  2024-04-17  1:30                     ` Xuan Zhuo
@ 2024-04-17  4:08                       ` Jason Wang
  2024-04-17  8:20                         ` Xuan Zhuo
  0 siblings, 1 reply; 49+ messages in thread
From: Jason Wang @ 2024-04-17  4:08 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev,
	Jesper Dangaard Brouer

On Wed, Apr 17, 2024 at 9:38 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Tue, 16 Apr 2024 11:24:53 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Mon, Apr 15, 2024 at 5:04 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Mon, 15 Apr 2024 16:56:45 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Mon, Apr 15, 2024 at 4:50 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Mon, 15 Apr 2024 14:43:24 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Mon, Apr 15, 2024 at 10:35 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > On Fri, 12 Apr 2024 13:49:12 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > On Fri, Apr 12, 2024 at 1:39 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > >
> > > > > > > > > On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Now, we chain the pages of big mode by the page's private variable.
> > > > > > > > > > > But a subsequent patch aims to make the big mode to support
> > > > > > > > > > > premapped mode. This requires additional space to store the dma addr.
> > > > > > > > > > >
> > > > > > > > > > > Within the sub-struct that contains the 'private', there is no suitable
> > > > > > > > > > > variable for storing the DMA addr.
> > > > > > > > > > >
> > > > > > > > > > >                 struct {        /* Page cache and anonymous pages */
> > > > > > > > > > >                         /**
> > > > > > > > > > >                          * @lru: Pageout list, eg. active_list protected by
> > > > > > > > > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > > > > > > > > >                          * by the page owner.
> > > > > > > > > > >                          */
> > > > > > > > > > >                         union {
> > > > > > > > > > >                                 struct list_head lru;
> > > > > > > > > > >
> > > > > > > > > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > > > > > > > > >                                 struct {
> > > > > > > > > > >                                         /* Always even, to negate PageTail */
> > > > > > > > > > >                                         void *__filler;
> > > > > > > > > > >                                         /* Count page's or folio's mlocks */
> > > > > > > > > > >                                         unsigned int mlock_count;
> > > > > > > > > > >                                 };
> > > > > > > > > > >
> > > > > > > > > > >                                 /* Or, free page */
> > > > > > > > > > >                                 struct list_head buddy_list;
> > > > > > > > > > >                                 struct list_head pcp_list;
> > > > > > > > > > >                         };
> > > > > > > > > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > > > > > > > > >                         struct address_space *mapping;
> > > > > > > > > > >                         union {
> > > > > > > > > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > > > > > > > > >                                 unsigned long share;    /* share count for fsdax */
> > > > > > > > > > >                         };
> > > > > > > > > > >                         /**
> > > > > > > > > > >                          * @private: Mapping-private opaque data.
> > > > > > > > > > >                          * Usually used for buffer_heads if PagePrivate.
> > > > > > > > > > >                          * Used for swp_entry_t if PageSwapCache.
> > > > > > > > > > >                          * Indicates order in the buddy system if PageBuddy.
> > > > > > > > > > >                          */
> > > > > > > > > > >                         unsigned long private;
> > > > > > > > > > >                 };
> > > > > > > > > > >
> > > > > > > > > > > But within the page pool struct, we have a variable called
> > > > > > > > > > > dma_addr that is appropriate for storing dma addr.
> > > > > > > > > > > And that struct is used by netstack. That works to our advantage.
> > > > > > > > > > >
> > > > > > > > > > >                 struct {        /* page_pool used by netstack */
> > > > > > > > > > >                         /**
> > > > > > > > > > >                          * @pp_magic: magic value to avoid recycling non
> > > > > > > > > > >                          * page_pool allocated pages.
> > > > > > > > > > >                          */
> > > > > > > > > > >                         unsigned long pp_magic;
> > > > > > > > > > >                         struct page_pool *pp;
> > > > > > > > > > >                         unsigned long _pp_mapping_pad;
> > > > > > > > > > >                         unsigned long dma_addr;
> > > > > > > > > > >                         atomic_long_t pp_ref_count;
> > > > > > > > > > >                 };
> > > > > > > > > > >
> > > > > > > > > > > On the other side, we should use variables from the same sub-struct.
> > > > > > > > > > > So this patch replaces the "private" with "pp".
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > > > ---
> > > > > > > > > >
> > > > > > > > > > Instead of doing a customized version of page pool, can we simply
> > > > > > > > > > switch to use page pool for big mode instead? Then we don't need to
> > > > > > > > > > bother the dma stuffs.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > The page pool needs to do the dma by the DMA APIs.
> > > > > > > > > So we can not use the page pool directly.
> > > > > > > >
> > > > > > > > I found this:
> > > > > > > >
> > > > > > > > define PP_FLAG_DMA_MAP         BIT(0) /* Should page_pool do the DMA
> > > > > > > >                                         * map/unmap
> > > > > > > >
> > > > > > > > It seems to work here?
> > > > > > >
> > > > > > >
> > > > > > > I have studied the page pool mechanism and believe that we cannot use it
> > > > > > > directly. We can make the page pool to bypass the DMA operations.
> > > > > > > This allows us to handle DMA within virtio-net for pages allocated from the page
> > > > > > > pool. Furthermore, we can utilize page pool helpers to associate the DMA address
> > > > > > > to the page.
> > > > > > >
> > > > > > > However, the critical issue pertains to unmapping. Ideally, we want to return
> > > > > > > the mapped pages to the page pool and reuse them. In doing so, we can omit the
> > > > > > > unmapping and remapping steps.
> > > > > > >
> > > > > > > Currently, there's a caveat: when the page pool cache is full, it disconnects
> > > > > > > and releases the pages. When the pool hits its capacity, pages are relinquished
> > > > > > > without a chance for unmapping.
> > > > > >
> > > > > > Technically, when ptr_ring is full there could be a fallback, but then
> > > > > > it requires expensive synchronization between producer and consumer.
> > > > > > For virtio-net, it might not be a problem because add/get has been
> > > > > > synchronized. (It might be relaxed in the future, actually we've
> > > > > > already seen a requirement in the past for virito-blk).
> > > > >
> > > > > The point is that the page will be released by page pool directly,
> > > > > we will have no change to unmap that, if we work with page pool.
> > > >
> > > > I mean if we have a fallback, there would be no need to release these
> > > > pages but put them into a link list.
> > >
> > >
> > > What fallback?
> >
> > https://lore.kernel.org/netdev/1519607771-20613-1-git-send-email-mst@redhat.com/
> >
> > >
> > > If we put the pages to the link list, why we use the page pool?
> >
> > The size of the cache and ptr_ring needs to be fixed.
> >
> > Again, as explained above, it needs more benchmarks and looks like a
> > separate topic.
> >
> > >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > > If we were to unmap pages each time before
> > > > > > > returning them to the pool, we would negate the benefits of bypassing the
> > > > > > > mapping and unmapping process altogether.
> > > > > >
> > > > > > Yes, but the problem in this approach is that it creates a corner
> > > > > > exception where dma_addr is used outside the page pool.
> > > > >
> > > > > YES. This is a corner exception. We need to introduce this case to the page
> > > > > pool.
> > > > >
> > > > > So for introducing the page-pool to virtio-net(not only for big mode),
> > > > > we may need to push the page-pool to support dma by drivers.
> > > >
> > > > Adding Jesper for some comments.
> > > >
> > > > >
> > > > > Back to this patch set, I think we should keep the virtio-net to manage
> > > > > the pages.
> > > > >
> > > > > What do you think?
> > > >
> > > > I might be wrong, but I think if we need to either
> > > >
> > > > 1) seek a way to manage the pages by yourself but not touching page
> > > > pool metadata (or Jesper is fine with this)
> > >
> > > Do you mean working with page pool or not?
> > >
> >
> > I meant if Jesper is fine with reusing page pool metadata like this patch.
> >
> > > If we manage the pages by self(no page pool), we do not care the metadata is for
> > > page pool or not. We just use the space of pages like the "private".
> >
> > That's also fine.
> >
> > >
> > >
> > > > 2) optimize the unmap for page pool
> > > >
> > > > or even
> > > >
> > > > 3) just do dma_unmap before returning the page back to the page pool,
> > > > we don't get all the benefits of page pool but we end up with simple
> > > > codes (no fallback for premapping).
> > >
> > > I am ok for this.
> >
> > Right, we just need to make sure there's no performance regression,
> > then it would be fine.
> >
> > I see for example mana did this as well.
>
> I think we should not use page pool directly now,
> because the mana does not need a space to store the dma address.
> We need to store the dma address for unmapping.
>
> If we use page pool without PP_FLAG_DMA_MAP, then store the dma address by
> page.dma_addr, I think that is not safe.

Jesper, could you comment on this?

>
> I think the way of this patch set is fine.

So it reuses page pool structure in the page structure for another use case.

> We just use the
> space of the page whatever it is page pool or not to store
> the link and dma address.

Probably because we've already "abused" page->private. I would leave
it for other maintainers to decide.

Thanks

>
> Thanks.
>
> >
> > Thanks
> >
> > >
> > >
> > > Thanks.
> > >
> > > >
> > > > Thanks
> > > >
> > > >
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > Maybe for big mode it doesn't matter too much if there's no
> > > > > > performance improvement.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
>


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page
  2024-04-17  4:08                       ` Jason Wang
@ 2024-04-17  8:20                         ` Xuan Zhuo
  2024-04-18  4:15                           ` Jason Wang
  2024-04-18 20:19                           ` Jesper Dangaard Brouer
  0 siblings, 2 replies; 49+ messages in thread
From: Xuan Zhuo @ 2024-04-17  8:20 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev,
	Jesper Dangaard Brouer

On Wed, 17 Apr 2024 12:08:10 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Wed, Apr 17, 2024 at 9:38 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Tue, 16 Apr 2024 11:24:53 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Mon, Apr 15, 2024 at 5:04 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Mon, 15 Apr 2024 16:56:45 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Mon, Apr 15, 2024 at 4:50 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Mon, 15 Apr 2024 14:43:24 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Mon, Apr 15, 2024 at 10:35 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > On Fri, 12 Apr 2024 13:49:12 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > On Fri, Apr 12, 2024 at 1:39 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > > On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Now, we chain the pages of big mode by the page's private variable.
> > > > > > > > > > > > But a subsequent patch aims to make the big mode to support
> > > > > > > > > > > > premapped mode. This requires additional space to store the dma addr.
> > > > > > > > > > > >
> > > > > > > > > > > > Within the sub-struct that contains the 'private', there is no suitable
> > > > > > > > > > > > variable for storing the DMA addr.
> > > > > > > > > > > >
> > > > > > > > > > > >                 struct {        /* Page cache and anonymous pages */
> > > > > > > > > > > >                         /**
> > > > > > > > > > > >                          * @lru: Pageout list, eg. active_list protected by
> > > > > > > > > > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > > > > > > > > > >                          * by the page owner.
> > > > > > > > > > > >                          */
> > > > > > > > > > > >                         union {
> > > > > > > > > > > >                                 struct list_head lru;
> > > > > > > > > > > >
> > > > > > > > > > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > > > > > > > > > >                                 struct {
> > > > > > > > > > > >                                         /* Always even, to negate PageTail */
> > > > > > > > > > > >                                         void *__filler;
> > > > > > > > > > > >                                         /* Count page's or folio's mlocks */
> > > > > > > > > > > >                                         unsigned int mlock_count;
> > > > > > > > > > > >                                 };
> > > > > > > > > > > >
> > > > > > > > > > > >                                 /* Or, free page */
> > > > > > > > > > > >                                 struct list_head buddy_list;
> > > > > > > > > > > >                                 struct list_head pcp_list;
> > > > > > > > > > > >                         };
> > > > > > > > > > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > > > > > > > > > >                         struct address_space *mapping;
> > > > > > > > > > > >                         union {
> > > > > > > > > > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > > > > > > > > > >                                 unsigned long share;    /* share count for fsdax */
> > > > > > > > > > > >                         };
> > > > > > > > > > > >                         /**
> > > > > > > > > > > >                          * @private: Mapping-private opaque data.
> > > > > > > > > > > >                          * Usually used for buffer_heads if PagePrivate.
> > > > > > > > > > > >                          * Used for swp_entry_t if PageSwapCache.
> > > > > > > > > > > >                          * Indicates order in the buddy system if PageBuddy.
> > > > > > > > > > > >                          */
> > > > > > > > > > > >                         unsigned long private;
> > > > > > > > > > > >                 };
> > > > > > > > > > > >
> > > > > > > > > > > > But within the page pool struct, we have a variable called
> > > > > > > > > > > > dma_addr that is appropriate for storing dma addr.
> > > > > > > > > > > > And that struct is used by netstack. That works to our advantage.
> > > > > > > > > > > >
> > > > > > > > > > > >                 struct {        /* page_pool used by netstack */
> > > > > > > > > > > >                         /**
> > > > > > > > > > > >                          * @pp_magic: magic value to avoid recycling non
> > > > > > > > > > > >                          * page_pool allocated pages.
> > > > > > > > > > > >                          */
> > > > > > > > > > > >                         unsigned long pp_magic;
> > > > > > > > > > > >                         struct page_pool *pp;
> > > > > > > > > > > >                         unsigned long _pp_mapping_pad;
> > > > > > > > > > > >                         unsigned long dma_addr;
> > > > > > > > > > > >                         atomic_long_t pp_ref_count;
> > > > > > > > > > > >                 };
> > > > > > > > > > > >
> > > > > > > > > > > > On the other side, we should use variables from the same sub-struct.
> > > > > > > > > > > > So this patch replaces the "private" with "pp".
> > > > > > > > > > > >
> > > > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > > > > ---
> > > > > > > > > > >
> > > > > > > > > > > Instead of doing a customized version of page pool, can we simply
> > > > > > > > > > > switch to use page pool for big mode instead? Then we don't need to
> > > > > > > > > > > bother the dma stuffs.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > The page pool needs to do the dma by the DMA APIs.
> > > > > > > > > > So we can not use the page pool directly.
> > > > > > > > >
> > > > > > > > > I found this:
> > > > > > > > >
> > > > > > > > > define PP_FLAG_DMA_MAP         BIT(0) /* Should page_pool do the DMA
> > > > > > > > >                                         * map/unmap
> > > > > > > > >
> > > > > > > > > It seems to work here?
> > > > > > > >
> > > > > > > >
> > > > > > > > I have studied the page pool mechanism and believe that we cannot use it
> > > > > > > > directly. We can make the page pool to bypass the DMA operations.
> > > > > > > > This allows us to handle DMA within virtio-net for pages allocated from the page
> > > > > > > > pool. Furthermore, we can utilize page pool helpers to associate the DMA address
> > > > > > > > to the page.
> > > > > > > >
> > > > > > > > However, the critical issue pertains to unmapping. Ideally, we want to return
> > > > > > > > the mapped pages to the page pool and reuse them. In doing so, we can omit the
> > > > > > > > unmapping and remapping steps.
> > > > > > > >
> > > > > > > > Currently, there's a caveat: when the page pool cache is full, it disconnects
> > > > > > > > and releases the pages. When the pool hits its capacity, pages are relinquished
> > > > > > > > without a chance for unmapping.
> > > > > > >
> > > > > > > Technically, when ptr_ring is full there could be a fallback, but then
> > > > > > > it requires expensive synchronization between producer and consumer.
> > > > > > > For virtio-net, it might not be a problem because add/get has been
> > > > > > > synchronized. (It might be relaxed in the future, actually we've
> > > > > > > already seen a requirement in the past for virito-blk).
> > > > > >
> > > > > > The point is that the page will be released by page pool directly,
> > > > > > we will have no change to unmap that, if we work with page pool.
> > > > >
> > > > > I mean if we have a fallback, there would be no need to release these
> > > > > pages but put them into a link list.
> > > >
> > > >
> > > > What fallback?
> > >
> > > https://lore.kernel.org/netdev/1519607771-20613-1-git-send-email-mst@redhat.com/
> > >
> > > >
> > > > If we put the pages to the link list, why we use the page pool?
> > >
> > > The size of the cache and ptr_ring needs to be fixed.
> > >
> > > Again, as explained above, it needs more benchmarks and looks like a
> > > separate topic.
> > >
> > > >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > If we were to unmap pages each time before
> > > > > > > > returning them to the pool, we would negate the benefits of bypassing the
> > > > > > > > mapping and unmapping process altogether.
> > > > > > >
> > > > > > > Yes, but the problem in this approach is that it creates a corner
> > > > > > > exception where dma_addr is used outside the page pool.
> > > > > >
> > > > > > YES. This is a corner exception. We need to introduce this case to the page
> > > > > > pool.
> > > > > >
> > > > > > So for introducing the page-pool to virtio-net(not only for big mode),
> > > > > > we may need to push the page-pool to support dma by drivers.
> > > > >
> > > > > Adding Jesper for some comments.
> > > > >
> > > > > >
> > > > > > Back to this patch set, I think we should keep the virtio-net to manage
> > > > > > the pages.
> > > > > >
> > > > > > What do you think?
> > > > >
> > > > > I might be wrong, but I think if we need to either
> > > > >
> > > > > 1) seek a way to manage the pages by yourself but not touching page
> > > > > pool metadata (or Jesper is fine with this)
> > > >
> > > > Do you mean working with page pool or not?
> > > >
> > >
> > > I meant if Jesper is fine with reusing page pool metadata like this patch.
> > >
> > > > If we manage the pages by self(no page pool), we do not care the metadata is for
> > > > page pool or not. We just use the space of pages like the "private".
> > >
> > > That's also fine.
> > >
> > > >
> > > >
> > > > > 2) optimize the unmap for page pool
> > > > >
> > > > > or even
> > > > >
> > > > > 3) just do dma_unmap before returning the page back to the page pool,
> > > > > we don't get all the benefits of page pool but we end up with simple
> > > > > codes (no fallback for premapping).
> > > >
> > > > I am ok for this.
> > >
> > > Right, we just need to make sure there's no performance regression,
> > > then it would be fine.
> > >
> > > I see for example mana did this as well.
> >
> > I think we should not use page pool directly now,
> > because the mana does not need a space to store the dma address.
> > We need to store the dma address for unmapping.
> >
> > If we use page pool without PP_FLAG_DMA_MAP, then store the dma address by
> > page.dma_addr, I think that is not safe.
>
> Jesper, could you comment on this?
>
> >
> > I think the way of this patch set is fine.
>
> So it reuses page pool structure in the page structure for another use case.
>
> > We just use the
> > space of the page whatever it is page pool or not to store
> > the link and dma address.
>
> Probably because we've already "abused" page->private. I would leave
> it for other maintainers to decide.

If we do not want to use the elements of the page directly,
the page pool is a good way.

But we must make the page pool to work without PP_FLAG_DMA_MAP, because the
virtio-net must use the DMA APIs wrapped by virtio core.

And we still need to store the dma address, because virtio-net
can not access the descs directly.

@Jesper can we setup the page pool without PP_FLAG_DMA_MAP,
and call page_pool_set_dma_addr() from the virtio-net driver?

Thanks.



>
> Thanks
>
> >
> > Thanks.
> >
> > >
> > > Thanks
> > >
> > > >
> > > >
> > > > Thanks.
> > > >
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > > Maybe for big mode it doesn't matter too much if there's no
> > > > > > > performance improvement.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> >
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page
  2024-04-17  8:20                         ` Xuan Zhuo
@ 2024-04-18  4:15                           ` Jason Wang
  2024-04-18  4:16                             ` Jason Wang
  2024-04-18 20:19                           ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 49+ messages in thread
From: Jason Wang @ 2024-04-18  4:15 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev,
	Jesper Dangaard Brouer

On Wed, Apr 17, 2024 at 4:45 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 17 Apr 2024 12:08:10 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Wed, Apr 17, 2024 at 9:38 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Tue, 16 Apr 2024 11:24:53 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Mon, Apr 15, 2024 at 5:04 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Mon, 15 Apr 2024 16:56:45 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Mon, Apr 15, 2024 at 4:50 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > On Mon, 15 Apr 2024 14:43:24 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > On Mon, Apr 15, 2024 at 10:35 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > >
> > > > > > > > > On Fri, 12 Apr 2024 13:49:12 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > On Fri, Apr 12, 2024 at 1:39 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > > > On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Now, we chain the pages of big mode by the page's private variable.
> > > > > > > > > > > > > But a subsequent patch aims to make the big mode to support
> > > > > > > > > > > > > premapped mode. This requires additional space to store the dma addr.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Within the sub-struct that contains the 'private', there is no suitable
> > > > > > > > > > > > > variable for storing the DMA addr.
> > > > > > > > > > > > >
> > > > > > > > > > > > >                 struct {        /* Page cache and anonymous pages */
> > > > > > > > > > > > >                         /**
> > > > > > > > > > > > >                          * @lru: Pageout list, eg. active_list protected by
> > > > > > > > > > > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > > > > > > > > > > >                          * by the page owner.
> > > > > > > > > > > > >                          */
> > > > > > > > > > > > >                         union {
> > > > > > > > > > > > >                                 struct list_head lru;
> > > > > > > > > > > > >
> > > > > > > > > > > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > > > > > > > > > > >                                 struct {
> > > > > > > > > > > > >                                         /* Always even, to negate PageTail */
> > > > > > > > > > > > >                                         void *__filler;
> > > > > > > > > > > > >                                         /* Count page's or folio's mlocks */
> > > > > > > > > > > > >                                         unsigned int mlock_count;
> > > > > > > > > > > > >                                 };
> > > > > > > > > > > > >
> > > > > > > > > > > > >                                 /* Or, free page */
> > > > > > > > > > > > >                                 struct list_head buddy_list;
> > > > > > > > > > > > >                                 struct list_head pcp_list;
> > > > > > > > > > > > >                         };
> > > > > > > > > > > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > > > > > > > > > > >                         struct address_space *mapping;
> > > > > > > > > > > > >                         union {
> > > > > > > > > > > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > > > > > > > > > > >                                 unsigned long share;    /* share count for fsdax */
> > > > > > > > > > > > >                         };
> > > > > > > > > > > > >                         /**
> > > > > > > > > > > > >                          * @private: Mapping-private opaque data.
> > > > > > > > > > > > >                          * Usually used for buffer_heads if PagePrivate.
> > > > > > > > > > > > >                          * Used for swp_entry_t if PageSwapCache.
> > > > > > > > > > > > >                          * Indicates order in the buddy system if PageBuddy.
> > > > > > > > > > > > >                          */
> > > > > > > > > > > > >                         unsigned long private;
> > > > > > > > > > > > >                 };
> > > > > > > > > > > > >
> > > > > > > > > > > > > But within the page pool struct, we have a variable called
> > > > > > > > > > > > > dma_addr that is appropriate for storing dma addr.
> > > > > > > > > > > > > And that struct is used by netstack. That works to our advantage.
> > > > > > > > > > > > >
> > > > > > > > > > > > >                 struct {        /* page_pool used by netstack */
> > > > > > > > > > > > >                         /**
> > > > > > > > > > > > >                          * @pp_magic: magic value to avoid recycling non
> > > > > > > > > > > > >                          * page_pool allocated pages.
> > > > > > > > > > > > >                          */
> > > > > > > > > > > > >                         unsigned long pp_magic;
> > > > > > > > > > > > >                         struct page_pool *pp;
> > > > > > > > > > > > >                         unsigned long _pp_mapping_pad;
> > > > > > > > > > > > >                         unsigned long dma_addr;
> > > > > > > > > > > > >                         atomic_long_t pp_ref_count;
> > > > > > > > > > > > >                 };
> > > > > > > > > > > > >
> > > > > > > > > > > > > On the other side, we should use variables from the same sub-struct.
> > > > > > > > > > > > > So this patch replaces the "private" with "pp".
> > > > > > > > > > > > >
> > > > > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > > > > > ---
> > > > > > > > > > > >
> > > > > > > > > > > > Instead of doing a customized version of page pool, can we simply
> > > > > > > > > > > > switch to use page pool for big mode instead? Then we don't need to
> > > > > > > > > > > > bother the dma stuffs.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > The page pool needs to do the dma by the DMA APIs.
> > > > > > > > > > > So we can not use the page pool directly.
> > > > > > > > > >
> > > > > > > > > > I found this:
> > > > > > > > > >
> > > > > > > > > > define PP_FLAG_DMA_MAP         BIT(0) /* Should page_pool do the DMA
> > > > > > > > > >                                         * map/unmap
> > > > > > > > > >
> > > > > > > > > > It seems to work here?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I have studied the page pool mechanism and believe that we cannot use it
> > > > > > > > > directly. We can make the page pool to bypass the DMA operations.
> > > > > > > > > This allows us to handle DMA within virtio-net for pages allocated from the page
> > > > > > > > > pool. Furthermore, we can utilize page pool helpers to associate the DMA address
> > > > > > > > > to the page.
> > > > > > > > >
> > > > > > > > > However, the critical issue pertains to unmapping. Ideally, we want to return
> > > > > > > > > the mapped pages to the page pool and reuse them. In doing so, we can omit the
> > > > > > > > > unmapping and remapping steps.
> > > > > > > > >
> > > > > > > > > Currently, there's a caveat: when the page pool cache is full, it disconnects
> > > > > > > > > and releases the pages. When the pool hits its capacity, pages are relinquished
> > > > > > > > > without a chance for unmapping.
> > > > > > > >
> > > > > > > > Technically, when ptr_ring is full there could be a fallback, but then
> > > > > > > > it requires expensive synchronization between producer and consumer.
> > > > > > > > For virtio-net, it might not be a problem because add/get has been
> > > > > > > > synchronized. (It might be relaxed in the future, actually we've
> > > > > > > > already seen a requirement in the past for virito-blk).
> > > > > > >
> > > > > > > The point is that the page will be released by page pool directly,
> > > > > > > we will have no change to unmap that, if we work with page pool.
> > > > > >
> > > > > > I mean if we have a fallback, there would be no need to release these
> > > > > > pages but put them into a link list.
> > > > >
> > > > >
> > > > > What fallback?
> > > >
> > > > https://lore.kernel.org/netdev/1519607771-20613-1-git-send-email-mst@redhat.com/
> > > >
> > > > >
> > > > > If we put the pages to the link list, why we use the page pool?
> > > >
> > > > The size of the cache and ptr_ring needs to be fixed.
> > > >
> > > > Again, as explained above, it needs more benchmarks and looks like a
> > > > separate topic.
> > > >
> > > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > If we were to unmap pages each time before
> > > > > > > > > returning them to the pool, we would negate the benefits of bypassing the
> > > > > > > > > mapping and unmapping process altogether.
> > > > > > > >
> > > > > > > > Yes, but the problem in this approach is that it creates a corner
> > > > > > > > exception where dma_addr is used outside the page pool.
> > > > > > >
> > > > > > > YES. This is a corner exception. We need to introduce this case to the page
> > > > > > > pool.
> > > > > > >
> > > > > > > So for introducing the page-pool to virtio-net(not only for big mode),
> > > > > > > we may need to push the page-pool to support dma by drivers.
> > > > > >
> > > > > > Adding Jesper for some comments.
> > > > > >
> > > > > > >
> > > > > > > Back to this patch set, I think we should keep the virtio-net to manage
> > > > > > > the pages.
> > > > > > >
> > > > > > > What do you think?
> > > > > >
> > > > > > I might be wrong, but I think if we need to either
> > > > > >
> > > > > > 1) seek a way to manage the pages by yourself but not touching page
> > > > > > pool metadata (or Jesper is fine with this)
> > > > >
> > > > > Do you mean working with page pool or not?
> > > > >
> > > >
> > > > I meant if Jesper is fine with reusing page pool metadata like this patch.
> > > >
> > > > > If we manage the pages by self(no page pool), we do not care the metadata is for
> > > > > page pool or not. We just use the space of pages like the "private".
> > > >
> > > > That's also fine.
> > > >
> > > > >
> > > > >
> > > > > > 2) optimize the unmap for page pool
> > > > > >
> > > > > > or even
> > > > > >
> > > > > > 3) just do dma_unmap before returning the page back to the page pool,
> > > > > > we don't get all the benefits of page pool but we end up with simple
> > > > > > codes (no fallback for premapping).
> > > > >
> > > > > I am ok for this.
> > > >
> > > > Right, we just need to make sure there's no performance regression,
> > > > then it would be fine.
> > > >
> > > > I see for example mana did this as well.
> > >
> > > I think we should not use page pool directly now,
> > > because the mana does not need a space to store the dma address.
> > > We need to store the dma address for unmapping.
> > >
> > > If we use page pool without PP_FLAG_DMA_MAP, then store the dma address by
> > > page.dma_addr, I think that is not safe.
> >
> > Jesper, could you comment on this?
> >
> > >
> > > I think the way of this patch set is fine.
> >
> > So it reuses page pool structure in the page structure for another use case.
> >
> > > We just use the
> > > space of the page whatever it is page pool or not to store
> > > the link and dma address.
> >
> > Probably because we've already "abused" page->private. I would leave
> > it for other maintainers to decide.
>
> If we do not want to use the elements of the page directly,
> the page pool is a good way.
>

Rethink this, I think that the approach of this series should be fine.
It should be sufficient for virtio-net to guarantee that those pages
are not used by the page pool.

I will continue the review.

Thanks


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page
  2024-04-18  4:15                           ` Jason Wang
@ 2024-04-18  4:16                             ` Jason Wang
  0 siblings, 0 replies; 49+ messages in thread
From: Jason Wang @ 2024-04-18  4:16 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev,
	Jesper Dangaard Brouer

On Thu, Apr 18, 2024 at 12:15 PM Jason Wang <jasowang@redhat.com> wrote:
>
> On Wed, Apr 17, 2024 at 4:45 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Wed, 17 Apr 2024 12:08:10 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Wed, Apr 17, 2024 at 9:38 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Tue, 16 Apr 2024 11:24:53 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Mon, Apr 15, 2024 at 5:04 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Mon, 15 Apr 2024 16:56:45 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Mon, Apr 15, 2024 at 4:50 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, 15 Apr 2024 14:43:24 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > On Mon, Apr 15, 2024 at 10:35 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Fri, 12 Apr 2024 13:49:12 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > > On Fri, Apr 12, 2024 at 1:39 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > > > > On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Now, we chain the pages of big mode by the page's private variable.
> > > > > > > > > > > > > > But a subsequent patch aims to make the big mode to support
> > > > > > > > > > > > > > premapped mode. This requires additional space to store the dma addr.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Within the sub-struct that contains the 'private', there is no suitable
> > > > > > > > > > > > > > variable for storing the DMA addr.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >                 struct {        /* Page cache and anonymous pages */
> > > > > > > > > > > > > >                         /**
> > > > > > > > > > > > > >                          * @lru: Pageout list, eg. active_list protected by
> > > > > > > > > > > > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > > > > > > > > > > > >                          * by the page owner.
> > > > > > > > > > > > > >                          */
> > > > > > > > > > > > > >                         union {
> > > > > > > > > > > > > >                                 struct list_head lru;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > > > > > > > > > > > >                                 struct {
> > > > > > > > > > > > > >                                         /* Always even, to negate PageTail */
> > > > > > > > > > > > > >                                         void *__filler;
> > > > > > > > > > > > > >                                         /* Count page's or folio's mlocks */
> > > > > > > > > > > > > >                                         unsigned int mlock_count;
> > > > > > > > > > > > > >                                 };
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >                                 /* Or, free page */
> > > > > > > > > > > > > >                                 struct list_head buddy_list;
> > > > > > > > > > > > > >                                 struct list_head pcp_list;
> > > > > > > > > > > > > >                         };
> > > > > > > > > > > > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > > > > > > > > > > > >                         struct address_space *mapping;
> > > > > > > > > > > > > >                         union {
> > > > > > > > > > > > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > > > > > > > > > > > >                                 unsigned long share;    /* share count for fsdax */
> > > > > > > > > > > > > >                         };
> > > > > > > > > > > > > >                         /**
> > > > > > > > > > > > > >                          * @private: Mapping-private opaque data.
> > > > > > > > > > > > > >                          * Usually used for buffer_heads if PagePrivate.
> > > > > > > > > > > > > >                          * Used for swp_entry_t if PageSwapCache.
> > > > > > > > > > > > > >                          * Indicates order in the buddy system if PageBuddy.
> > > > > > > > > > > > > >                          */
> > > > > > > > > > > > > >                         unsigned long private;
> > > > > > > > > > > > > >                 };
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > But within the page pool struct, we have a variable called
> > > > > > > > > > > > > > dma_addr that is appropriate for storing dma addr.
> > > > > > > > > > > > > > And that struct is used by netstack. That works to our advantage.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >                 struct {        /* page_pool used by netstack */
> > > > > > > > > > > > > >                         /**
> > > > > > > > > > > > > >                          * @pp_magic: magic value to avoid recycling non
> > > > > > > > > > > > > >                          * page_pool allocated pages.
> > > > > > > > > > > > > >                          */
> > > > > > > > > > > > > >                         unsigned long pp_magic;
> > > > > > > > > > > > > >                         struct page_pool *pp;
> > > > > > > > > > > > > >                         unsigned long _pp_mapping_pad;
> > > > > > > > > > > > > >                         unsigned long dma_addr;
> > > > > > > > > > > > > >                         atomic_long_t pp_ref_count;
> > > > > > > > > > > > > >                 };
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On the other side, we should use variables from the same sub-struct.
> > > > > > > > > > > > > > So this patch replaces the "private" with "pp".
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > > > > > > ---
> > > > > > > > > > > > >
> > > > > > > > > > > > > Instead of doing a customized version of page pool, can we simply
> > > > > > > > > > > > > switch to use page pool for big mode instead? Then we don't need to
> > > > > > > > > > > > > bother the dma stuffs.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > The page pool needs to do the dma by the DMA APIs.
> > > > > > > > > > > > So we can not use the page pool directly.
> > > > > > > > > > >
> > > > > > > > > > > I found this:
> > > > > > > > > > >
> > > > > > > > > > > define PP_FLAG_DMA_MAP         BIT(0) /* Should page_pool do the DMA
> > > > > > > > > > >                                         * map/unmap
> > > > > > > > > > >
> > > > > > > > > > > It seems to work here?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I have studied the page pool mechanism and believe that we cannot use it
> > > > > > > > > > directly. We can make the page pool to bypass the DMA operations.
> > > > > > > > > > This allows us to handle DMA within virtio-net for pages allocated from the page
> > > > > > > > > > pool. Furthermore, we can utilize page pool helpers to associate the DMA address
> > > > > > > > > > to the page.
> > > > > > > > > >
> > > > > > > > > > However, the critical issue pertains to unmapping. Ideally, we want to return
> > > > > > > > > > the mapped pages to the page pool and reuse them. In doing so, we can omit the
> > > > > > > > > > unmapping and remapping steps.
> > > > > > > > > >
> > > > > > > > > > Currently, there's a caveat: when the page pool cache is full, it disconnects
> > > > > > > > > > and releases the pages. When the pool hits its capacity, pages are relinquished
> > > > > > > > > > without a chance for unmapping.
> > > > > > > > >
> > > > > > > > > Technically, when ptr_ring is full there could be a fallback, but then
> > > > > > > > > it requires expensive synchronization between producer and consumer.
> > > > > > > > > For virtio-net, it might not be a problem because add/get has been
> > > > > > > > > synchronized. (It might be relaxed in the future, actually we've
> > > > > > > > > already seen a requirement in the past for virito-blk).
> > > > > > > >
> > > > > > > > The point is that the page will be released by page pool directly,
> > > > > > > > we will have no change to unmap that, if we work with page pool.
> > > > > > >
> > > > > > > I mean if we have a fallback, there would be no need to release these
> > > > > > > pages but put them into a link list.
> > > > > >
> > > > > >
> > > > > > What fallback?
> > > > >
> > > > > https://lore.kernel.org/netdev/1519607771-20613-1-git-send-email-mst@redhat.com/
> > > > >
> > > > > >
> > > > > > If we put the pages to the link list, why we use the page pool?
> > > > >
> > > > > The size of the cache and ptr_ring needs to be fixed.
> > > > >
> > > > > Again, as explained above, it needs more benchmarks and looks like a
> > > > > separate topic.
> > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > If we were to unmap pages each time before
> > > > > > > > > > returning them to the pool, we would negate the benefits of bypassing the
> > > > > > > > > > mapping and unmapping process altogether.
> > > > > > > > >
> > > > > > > > > Yes, but the problem in this approach is that it creates a corner
> > > > > > > > > exception where dma_addr is used outside the page pool.
> > > > > > > >
> > > > > > > > YES. This is a corner exception. We need to introduce this case to the page
> > > > > > > > pool.
> > > > > > > >
> > > > > > > > So for introducing the page-pool to virtio-net(not only for big mode),
> > > > > > > > we may need to push the page-pool to support dma by drivers.
> > > > > > >
> > > > > > > Adding Jesper for some comments.
> > > > > > >
> > > > > > > >
> > > > > > > > Back to this patch set, I think we should keep the virtio-net to manage
> > > > > > > > the pages.
> > > > > > > >
> > > > > > > > What do you think?
> > > > > > >
> > > > > > > I might be wrong, but I think if we need to either
> > > > > > >
> > > > > > > 1) seek a way to manage the pages by yourself but not touching page
> > > > > > > pool metadata (or Jesper is fine with this)
> > > > > >
> > > > > > Do you mean working with page pool or not?
> > > > > >
> > > > >
> > > > > I meant if Jesper is fine with reusing page pool metadata like this patch.
> > > > >
> > > > > > If we manage the pages by self(no page pool), we do not care the metadata is for
> > > > > > page pool or not. We just use the space of pages like the "private".
> > > > >
> > > > > That's also fine.
> > > > >
> > > > > >
> > > > > >
> > > > > > > 2) optimize the unmap for page pool
> > > > > > >
> > > > > > > or even
> > > > > > >
> > > > > > > 3) just do dma_unmap before returning the page back to the page pool,
> > > > > > > we don't get all the benefits of page pool but we end up with simple
> > > > > > > codes (no fallback for premapping).
> > > > > >
> > > > > > I am ok for this.
> > > > >
> > > > > Right, we just need to make sure there's no performance regression,
> > > > > then it would be fine.
> > > > >
> > > > > I see for example mana did this as well.
> > > >
> > > > I think we should not use page pool directly now,
> > > > because the mana does not need a space to store the dma address.
> > > > We need to store the dma address for unmapping.
> > > >
> > > > If we use page pool without PP_FLAG_DMA_MAP, then store the dma address by
> > > > page.dma_addr, I think that is not safe.
> > >
> > > Jesper, could you comment on this?
> > >
> > > >
> > > > I think the way of this patch set is fine.
> > >
> > > So it reuses page pool structure in the page structure for another use case.
> > >
> > > > We just use the
> > > > space of the page whatever it is page pool or not to store
> > > > the link and dma address.
> > >
> > > Probably because we've already "abused" page->private. I would leave
> > > it for other maintainers to decide.
> >
> > If we do not want to use the elements of the page directly,
> > the page pool is a good way.
> >
>
> Rethink this, I think that the approach of this series should be fine.
> It should be sufficient for virtio-net to guarantee that those pages
> are not used by the page pool.
>
> I will continue the review.

Btw, it would be better to describe those design considerations in the
changelog (e.g why we don't use page pool etc).

Thanks

>
> Thanks


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 1/6] virtio_ring: introduce dma map api for page
  2024-04-11  2:51 ` [PATCH vhost 1/6] virtio_ring: introduce dma map api for page Xuan Zhuo
  2024-04-11 11:45   ` Alexander Lobakin
@ 2024-04-18  6:08   ` Jason Wang
  1 sibling, 0 replies; 49+ messages in thread
From: Jason Wang @ 2024-04-18  6:08 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> The virtio-net big mode sq will use these APIs to map the pages.
>
> dma_addr_t virtqueue_dma_map_page_attrs(struct virtqueue *_vq, struct page *page,
>                                        size_t offset, size_t size,
>                                        enum dma_data_direction dir,
>                                        unsigned long attrs);
> void virtqueue_dma_unmap_page_attrs(struct virtqueue *_vq, dma_addr_t addr,
>                                    size_t size, enum dma_data_direction dir,
>                                    unsigned long attrs);
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 2/6] virtio_ring: enable premapped mode whatever use_dma_api
  2024-04-11  2:51 ` [PATCH vhost 2/6] virtio_ring: enable premapped mode whatever use_dma_api Xuan Zhuo
@ 2024-04-18  6:09   ` Jason Wang
  2024-04-18  6:13   ` Jason Wang
  1 sibling, 0 replies; 49+ messages in thread
From: Jason Wang @ 2024-04-18  6:09 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Now, we have virtio DMA APIs, the driver can be the premapped
> mode whatever the virtio core uses dma api or not.
>
> So remove the limit of checking use_dma_api from
> virtqueue_set_dma_premapped().
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page
  2024-04-11  2:51 ` [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page Xuan Zhuo
  2024-04-12  4:47   ` Jason Wang
@ 2024-04-18  6:11   ` Jason Wang
  1 sibling, 0 replies; 49+ messages in thread
From: Jason Wang @ 2024-04-18  6:11 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Now, we chain the pages of big mode by the page's private variable.
> But a subsequent patch aims to make the big mode to support
> premapped mode. This requires additional space to store the dma addr.
>
> Within the sub-struct that contains the 'private', there is no suitable
> variable for storing the DMA addr.
>
>                 struct {        /* Page cache and anonymous pages */
>                         /**
>                          * @lru: Pageout list, eg. active_list protected by
>                          * lruvec->lru_lock.  Sometimes used as a generic list
>                          * by the page owner.
>                          */
>                         union {
>                                 struct list_head lru;
>
>                                 /* Or, for the Unevictable "LRU list" slot */
>                                 struct {
>                                         /* Always even, to negate PageTail */
>                                         void *__filler;
>                                         /* Count page's or folio's mlocks */
>                                         unsigned int mlock_count;
>                                 };
>
>                                 /* Or, free page */
>                                 struct list_head buddy_list;
>                                 struct list_head pcp_list;
>                         };
>                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
>                         struct address_space *mapping;
>                         union {
>                                 pgoff_t index;          /* Our offset within mapping. */
>                                 unsigned long share;    /* share count for fsdax */
>                         };
>                         /**
>                          * @private: Mapping-private opaque data.
>                          * Usually used for buffer_heads if PagePrivate.
>                          * Used for swp_entry_t if PageSwapCache.
>                          * Indicates order in the buddy system if PageBuddy.
>                          */
>                         unsigned long private;
>                 };
>
> But within the page pool struct, we have a variable called
> dma_addr that is appropriate for storing dma addr.
> And that struct is used by netstack. That works to our advantage.
>
>                 struct {        /* page_pool used by netstack */
>                         /**
>                          * @pp_magic: magic value to avoid recycling non
>                          * page_pool allocated pages.
>                          */
>                         unsigned long pp_magic;
>                         struct page_pool *pp;
>                         unsigned long _pp_mapping_pad;
>                         unsigned long dma_addr;
>                         atomic_long_t pp_ref_count;
>                 };
>
> On the other side, we should use variables from the same sub-struct.
> So this patch replaces the "private" with "pp".
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/net/virtio_net.c | 27 +++++++++++++++------------
>  1 file changed, 15 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c22d1118a133..4446fb54de6d 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -48,6 +48,9 @@ module_param(napi_tx, bool, 0644);
>
>  #define VIRTIO_XDP_FLAG        BIT(0)
>
> +#define page_chain_next(p)     ((struct page *)((p)->pp))
> +#define page_chain_add(p, n)   ((p)->pp = (void *)n)
> +
>  /* RX packet size EWMA. The average packet size is used to determine the packet
>   * buffer size when refilling RX rings. As the entire RX ring may be refilled
>   * at once, the weight is chosen so that the EWMA will be insensitive to short-
> @@ -191,7 +194,7 @@ struct receive_queue {
>
>         struct virtnet_interrupt_coalesce intr_coal;
>
> -       /* Chain pages by the private ptr. */
> +       /* Chain pages by the page's pp struct. */
>         struct page *pages;
>
>         /* Average packet length for mergeable receive buffers. */
> @@ -432,16 +435,16 @@ skb_vnet_common_hdr(struct sk_buff *skb)
>  }
>
>  /*
> - * private is used to chain pages for big packets, put the whole
> - * most recent used list in the beginning for reuse
> + * put the whole most recent used list in the beginning for reuse
>   */

While at this, let's explain the pp is used to chain pages or we can
do it on the definition of page_chain_add().

Others look good.

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 2/6] virtio_ring: enable premapped mode whatever use_dma_api
  2024-04-11  2:51 ` [PATCH vhost 2/6] virtio_ring: enable premapped mode whatever use_dma_api Xuan Zhuo
  2024-04-18  6:09   ` Jason Wang
@ 2024-04-18  6:13   ` Jason Wang
  1 sibling, 0 replies; 49+ messages in thread
From: Jason Wang @ 2024-04-18  6:13 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Now, we have virtio DMA APIs, the driver can be the premapped
> mode whatever the virtio core uses dma api or not.
>
> So remove the limit of checking use_dma_api from
> virtqueue_set_dma_premapped().
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 7 +------
>  1 file changed, 1 insertion(+), 6 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 1b9fb680cff3..72c438c5f7d7 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -2730,7 +2730,7 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
>   *
>   * Returns zero or a negative error.
>   * 0: success.
> - * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
> + * -EINVAL: NOT called immediately.

Let's tweak the comment here, for example, we can say the vq is in use.

Thanks

>   */
>  int virtqueue_set_dma_premapped(struct virtqueue *_vq)
>  {
> @@ -2746,11 +2746,6 @@ int virtqueue_set_dma_premapped(struct virtqueue *_vq)
>                 return -EINVAL;
>         }
>
> -       if (!vq->use_dma_api) {
> -               END_USE(vq);
> -               return -EINVAL;
> -       }
> -
>         vq->premapped = true;
>         vq->do_unmap = false;
>
> --
> 2.32.0.3.g01195cf9f
>


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 4/6] virtio_net: big mode support premapped
  2024-04-11  2:51 ` [PATCH vhost 4/6] virtio_net: big mode support premapped Xuan Zhuo
                     ` (2 preceding siblings ...)
  2024-04-14  9:48   ` Dan Carpenter
@ 2024-04-18  6:25   ` Jason Wang
  2024-04-18  8:29     ` Xuan Zhuo
  3 siblings, 1 reply; 49+ messages in thread
From: Jason Wang @ 2024-04-18  6:25 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> In big mode, pre-mapping DMA is beneficial because if the pages are not
> used, we can reuse them without needing to unmap and remap.
>
> We require space to store the DMA address. I use the page.dma_addr to
> store the DMA address from the pp structure inside the page.
>
> Every page retrieved from get_a_page() is mapped, and its DMA address is
> stored in page.dma_addr. When a page is returned to the chain, we check
> the DMA status; if it is not mapped (potentially having been unmapped),
> we remap it before returning it to the chain.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/net/virtio_net.c | 98 +++++++++++++++++++++++++++++++++-------
>  1 file changed, 81 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 4446fb54de6d..7ea7e9bcd5d7 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -50,6 +50,7 @@ module_param(napi_tx, bool, 0644);
>
>  #define page_chain_next(p)     ((struct page *)((p)->pp))
>  #define page_chain_add(p, n)   ((p)->pp = (void *)n)
> +#define page_dma_addr(p)       ((p)->dma_addr)
>
>  /* RX packet size EWMA. The average packet size is used to determine the packet
>   * buffer size when refilling RX rings. As the entire RX ring may be refilled
> @@ -434,6 +435,46 @@ skb_vnet_common_hdr(struct sk_buff *skb)
>         return (struct virtio_net_common_hdr *)skb->cb;
>  }
>
> +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> +{
> +       sg->dma_address = addr;
> +       sg->length = len;
> +}
> +
> +static void page_chain_unmap(struct receive_queue *rq, struct page *p)
> +{
> +       virtqueue_dma_unmap_page_attrs(rq->vq, page_dma_addr(p), PAGE_SIZE,
> +                                      DMA_FROM_DEVICE, 0);
> +
> +       page_dma_addr(p) = DMA_MAPPING_ERROR;
> +}
> +
> +static int page_chain_map(struct receive_queue *rq, struct page *p)
> +{
> +       dma_addr_t addr;
> +
> +       addr = virtqueue_dma_map_page_attrs(rq->vq, p, 0, PAGE_SIZE, DMA_FROM_DEVICE, 0);
> +       if (virtqueue_dma_mapping_error(rq->vq, addr))
> +               return -ENOMEM;
> +
> +       page_dma_addr(p) = addr;
> +       return 0;
> +}
> +
> +static void page_chain_release(struct receive_queue *rq)
> +{
> +       struct page *p, *n;
> +
> +       for (p = rq->pages; p; p = n) {
> +               n = page_chain_next(p);
> +
> +               page_chain_unmap(rq, p);
> +               __free_pages(p, 0);
> +       }
> +
> +       rq->pages = NULL;
> +}
> +
>  /*
>   * put the whole most recent used list in the beginning for reuse
>   */
> @@ -441,6 +482,13 @@ static void give_pages(struct receive_queue *rq, struct page *page)
>  {
>         struct page *end;
>
> +       if (page_dma_addr(page) == DMA_MAPPING_ERROR) {

This looks strange, the map should be done during allocation. Under
which condition could we hit this?

> +               if (page_chain_map(rq, page)) {
> +                       __free_pages(page, 0);
> +                       return;
> +               }
> +       }
> +
>         /* Find end of list, sew whole thing into vi->rq.pages. */
>         for (end = page; page_chain_next(end); end = page_chain_next(end));
>
> @@ -456,8 +504,15 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
>                 rq->pages = page_chain_next(p);
>                 /* clear chain here, it is used to chain pages */
>                 page_chain_add(p, NULL);
> -       } else
> +       } else {
>                 p = alloc_page(gfp_mask);
> +
> +               if (page_chain_map(rq, p)) {
> +                       __free_pages(p, 0);
> +                       return NULL;
> +               }
> +       }
> +
>         return p;
>  }
>
> @@ -613,8 +668,6 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>                         return NULL;
>
>                 page = page_chain_next(page);
> -               if (page)
> -                       give_pages(rq, page);
>                 goto ok;
>         }
>
> @@ -640,6 +693,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>                         skb_add_rx_frag(skb, 0, page, offset, len, truesize);
>                 else
>                         page_to_free = page;
> +               page = NULL;
>                 goto ok;
>         }
>
> @@ -657,6 +711,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>         BUG_ON(offset >= PAGE_SIZE);
>         while (len) {
>                 unsigned int frag_size = min((unsigned)PAGE_SIZE - offset, len);
> +
> +               /* unmap the page before using it. */
> +               if (!offset)
> +                       page_chain_unmap(rq, page);
> +

This sounds strange, do we need a virtqueue_sync_for_cpu() helper here?

>                 skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page, offset,
>                                 frag_size, truesize);
>                 len -= frag_size;
> @@ -664,15 +723,15 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>                 offset = 0;
>         }
>
> -       if (page)
> -               give_pages(rq, page);
> -
>  ok:
>         hdr = skb_vnet_common_hdr(skb);
>         memcpy(hdr, hdr_p, hdr_len);
>         if (page_to_free)
>                 put_page(page_to_free);
>
> +       if (page)
> +               give_pages(rq, page);
> +
>         return skb;
>  }
>
> @@ -823,7 +882,8 @@ static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
>
>         rq = &vi->rq[i];
>
> -       if (rq->do_dma)
> +       /* Skip the unmap for big mode. */
> +       if (!vi->big_packets || vi->mergeable_rx_bufs)
>                 virtnet_rq_unmap(rq, buf, 0);
>
>         virtnet_rq_free_buf(vi, rq, buf);
> @@ -1346,8 +1406,12 @@ static struct sk_buff *receive_big(struct net_device *dev,
>                                    struct virtnet_rq_stats *stats)
>  {
>         struct page *page = buf;
> -       struct sk_buff *skb =
> -               page_to_skb(vi, rq, page, 0, len, PAGE_SIZE, 0);
> +       struct sk_buff *skb;
> +
> +       /* Unmap first page. The follow code may read this page. */
> +       page_chain_unmap(rq, page);

And probably here as well.

Thanks


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 5/6] virtio_net: enable premapped by default
  2024-04-11  2:51 ` [PATCH vhost 5/6] virtio_net: enable premapped by default Xuan Zhuo
@ 2024-04-18  6:26   ` Jason Wang
  2024-04-18  8:35     ` Xuan Zhuo
  0 siblings, 1 reply; 49+ messages in thread
From: Jason Wang @ 2024-04-18  6:26 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Currently, big, merge, and small modes all support the premapped mode.
> We can now enable premapped mode by default. Furthermore,
> virtqueue_set_dma_premapped() must succeed when called immediately after
> find_vqs(). Consequently, we can assume that premapped mode is always
> enabled.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/net/virtio_net.c | 12 +++++-------
>  1 file changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 7ea7e9bcd5d7..f0faf7c0fe59 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -860,15 +860,13 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
>
>  static void virtnet_rq_set_premapped(struct virtnet_info *vi)
>  {
> -       int i;
> -
> -       /* disable for big mode */
> -       if (!vi->mergeable_rx_bufs && vi->big_packets)
> -               return;
> +       int i, err;
>
>         for (i = 0; i < vi->max_queue_pairs; i++) {
> -               if (virtqueue_set_dma_premapped(vi->rq[i].vq))
> -                       continue;
> +               err = virtqueue_set_dma_premapped(vi->rq[i].vq);
> +
> +               /* never happen */
> +               BUG_ON(err);

Nit:

Maybe just a BUG_ON(virtqueue_set_dma_premapped()).

Btw, if there's no way to disable pre mapping, maybe it's better to
rename virtqueue_set_dma_premapped() to
virtqueue_enable_dma_premapped(ing).

Thanks

>
>                 vi->rq[i].do_dma = true;
>         }
> --
> 2.32.0.3.g01195cf9f
>


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 6/6] virtio_net: rx remove premapped failover code
  2024-04-11  2:51 ` [PATCH vhost 6/6] virtio_net: rx remove premapped failover code Xuan Zhuo
@ 2024-04-18  6:31   ` Jason Wang
  0 siblings, 0 replies; 49+ messages in thread
From: Jason Wang @ 2024-04-18  6:31 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Now, for the merge and small, the premapped mode can be enabled
> unconditionally.

I guess it's not only merge and small but big mode as well?

>
> So we can remove the failover code.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 4/6] virtio_net: big mode support premapped
  2024-04-18  6:25   ` Jason Wang
@ 2024-04-18  8:29     ` Xuan Zhuo
  2024-04-19  0:43       ` Jason Wang
  0 siblings, 1 reply; 49+ messages in thread
From: Xuan Zhuo @ 2024-04-18  8:29 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Thu, 18 Apr 2024 14:25:06 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > used, we can reuse them without needing to unmap and remap.
> >
> > We require space to store the DMA address. I use the page.dma_addr to
> > store the DMA address from the pp structure inside the page.
> >
> > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > stored in page.dma_addr. When a page is returned to the chain, we check
> > the DMA status; if it is not mapped (potentially having been unmapped),
> > we remap it before returning it to the chain.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/net/virtio_net.c | 98 +++++++++++++++++++++++++++++++++-------
> >  1 file changed, 81 insertions(+), 17 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 4446fb54de6d..7ea7e9bcd5d7 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -50,6 +50,7 @@ module_param(napi_tx, bool, 0644);
> >
> >  #define page_chain_next(p)     ((struct page *)((p)->pp))
> >  #define page_chain_add(p, n)   ((p)->pp = (void *)n)
> > +#define page_dma_addr(p)       ((p)->dma_addr)
> >
> >  /* RX packet size EWMA. The average packet size is used to determine the packet
> >   * buffer size when refilling RX rings. As the entire RX ring may be refilled
> > @@ -434,6 +435,46 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> >         return (struct virtio_net_common_hdr *)skb->cb;
> >  }
> >
> > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > +{
> > +       sg->dma_address = addr;
> > +       sg->length = len;
> > +}
> > +
> > +static void page_chain_unmap(struct receive_queue *rq, struct page *p)
> > +{
> > +       virtqueue_dma_unmap_page_attrs(rq->vq, page_dma_addr(p), PAGE_SIZE,
> > +                                      DMA_FROM_DEVICE, 0);
> > +
> > +       page_dma_addr(p) = DMA_MAPPING_ERROR;
> > +}
> > +
> > +static int page_chain_map(struct receive_queue *rq, struct page *p)
> > +{
> > +       dma_addr_t addr;
> > +
> > +       addr = virtqueue_dma_map_page_attrs(rq->vq, p, 0, PAGE_SIZE, DMA_FROM_DEVICE, 0);
> > +       if (virtqueue_dma_mapping_error(rq->vq, addr))
> > +               return -ENOMEM;
> > +
> > +       page_dma_addr(p) = addr;
> > +       return 0;
> > +}
> > +
> > +static void page_chain_release(struct receive_queue *rq)
> > +{
> > +       struct page *p, *n;
> > +
> > +       for (p = rq->pages; p; p = n) {
> > +               n = page_chain_next(p);
> > +
> > +               page_chain_unmap(rq, p);
> > +               __free_pages(p, 0);
> > +       }
> > +
> > +       rq->pages = NULL;
> > +}
> > +
> >  /*
> >   * put the whole most recent used list in the beginning for reuse
> >   */
> > @@ -441,6 +482,13 @@ static void give_pages(struct receive_queue *rq, struct page *page)
> >  {
> >         struct page *end;
> >
> > +       if (page_dma_addr(page) == DMA_MAPPING_ERROR) {
>
> This looks strange, the map should be done during allocation. Under
> which condition could we hit this?

This first page is umapped before we call page_to_skb().
The page can be put back to the link in case of failure.


>
> > +               if (page_chain_map(rq, page)) {
> > +                       __free_pages(page, 0);
> > +                       return;
> > +               }
> > +       }
> > +
> >         /* Find end of list, sew whole thing into vi->rq.pages. */
> >         for (end = page; page_chain_next(end); end = page_chain_next(end));
> >
> > @@ -456,8 +504,15 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
> >                 rq->pages = page_chain_next(p);
> >                 /* clear chain here, it is used to chain pages */
> >                 page_chain_add(p, NULL);
> > -       } else
> > +       } else {
> >                 p = alloc_page(gfp_mask);
> > +
> > +               if (page_chain_map(rq, p)) {
> > +                       __free_pages(p, 0);
> > +                       return NULL;
> > +               }
> > +       }
> > +
> >         return p;
> >  }
> >
> > @@ -613,8 +668,6 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >                         return NULL;
> >
> >                 page = page_chain_next(page);
> > -               if (page)
> > -                       give_pages(rq, page);
> >                 goto ok;
> >         }
> >
> > @@ -640,6 +693,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >                         skb_add_rx_frag(skb, 0, page, offset, len, truesize);
> >                 else
> >                         page_to_free = page;
> > +               page = NULL;
> >                 goto ok;
> >         }
> >
> > @@ -657,6 +711,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >         BUG_ON(offset >= PAGE_SIZE);
> >         while (len) {
> >                 unsigned int frag_size = min((unsigned)PAGE_SIZE - offset, len);
> > +
> > +               /* unmap the page before using it. */
> > +               if (!offset)
> > +                       page_chain_unmap(rq, page);
> > +
>
> This sounds strange, do we need a virtqueue_sync_for_cpu() helper here?

I think we do not need that. Because the umap api does it.
We do not work with DMA_SKIP_SYNC;

Thanks.


>
> >                 skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page, offset,
> >                                 frag_size, truesize);
> >                 len -= frag_size;
> > @@ -664,15 +723,15 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >                 offset = 0;
> >         }
> >
> > -       if (page)
> > -               give_pages(rq, page);
> > -
> >  ok:
> >         hdr = skb_vnet_common_hdr(skb);
> >         memcpy(hdr, hdr_p, hdr_len);
> >         if (page_to_free)
> >                 put_page(page_to_free);
> >
> > +       if (page)
> > +               give_pages(rq, page);
> > +
> >         return skb;
> >  }
> >
> > @@ -823,7 +882,8 @@ static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
> >
> >         rq = &vi->rq[i];
> >
> > -       if (rq->do_dma)
> > +       /* Skip the unmap for big mode. */
> > +       if (!vi->big_packets || vi->mergeable_rx_bufs)
> >                 virtnet_rq_unmap(rq, buf, 0);
> >
> >         virtnet_rq_free_buf(vi, rq, buf);
> > @@ -1346,8 +1406,12 @@ static struct sk_buff *receive_big(struct net_device *dev,
> >                                    struct virtnet_rq_stats *stats)
> >  {
> >         struct page *page = buf;
> > -       struct sk_buff *skb =
> > -               page_to_skb(vi, rq, page, 0, len, PAGE_SIZE, 0);
> > +       struct sk_buff *skb;
> > +
> > +       /* Unmap first page. The follow code may read this page. */
> > +       page_chain_unmap(rq, page);
>
> And probably here as well.
>
> Thanks
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 5/6] virtio_net: enable premapped by default
  2024-04-18  6:26   ` Jason Wang
@ 2024-04-18  8:35     ` Xuan Zhuo
  2024-04-19  0:44       ` Jason Wang
  0 siblings, 1 reply; 49+ messages in thread
From: Xuan Zhuo @ 2024-04-18  8:35 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Thu, 18 Apr 2024 14:26:33 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > Currently, big, merge, and small modes all support the premapped mode.
> > We can now enable premapped mode by default. Furthermore,
> > virtqueue_set_dma_premapped() must succeed when called immediately after
> > find_vqs(). Consequently, we can assume that premapped mode is always
> > enabled.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/net/virtio_net.c | 12 +++++-------
> >  1 file changed, 5 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 7ea7e9bcd5d7..f0faf7c0fe59 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -860,15 +860,13 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> >
> >  static void virtnet_rq_set_premapped(struct virtnet_info *vi)
> >  {
> > -       int i;
> > -
> > -       /* disable for big mode */
> > -       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > -               return;
> > +       int i, err;
> >
> >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > -               if (virtqueue_set_dma_premapped(vi->rq[i].vq))
> > -                       continue;
> > +               err = virtqueue_set_dma_premapped(vi->rq[i].vq);
> > +
> > +               /* never happen */
> > +               BUG_ON(err);
>
> Nit:
>
> Maybe just a BUG_ON(virtqueue_set_dma_premapped()).

OK


>
> Btw, if there's no way to disable pre mapping, maybe it's better to
> rename virtqueue_set_dma_premapped() to
> virtqueue_enable_dma_premapped(ing).

This patch will add a way to disable pre mapping.

	https://lore.kernel.org/all/20240327111430.108787-11-xuanzhuo@linux.alibaba.com/

Thanks.


>
> Thanks
>
> >
> >                 vi->rq[i].do_dma = true;
> >         }
> > --
> > 2.32.0.3.g01195cf9f
> >
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page
  2024-04-17  8:20                         ` Xuan Zhuo
  2024-04-18  4:15                           ` Jason Wang
@ 2024-04-18 20:19                           ` Jesper Dangaard Brouer
  2024-04-18 21:56                             ` Matthew Wilcox
  2024-04-19  7:11                             ` Xuan Zhuo
  1 sibling, 2 replies; 49+ messages in thread
From: Jesper Dangaard Brouer @ 2024-04-18 20:19 UTC (permalink / raw)
  To: Xuan Zhuo, Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, Linux-MM,
	Matthew Wilcox, Ilias Apalodimas, Mel Gorman



On 17/04/2024 10.20, Xuan Zhuo wrote:
> On Wed, 17 Apr 2024 12:08:10 +0800, Jason Wang <jasowang@redhat.com> wrote:
>> On Wed, Apr 17, 2024 at 9:38 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>
>>> On Tue, 16 Apr 2024 11:24:53 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>> On Mon, Apr 15, 2024 at 5:04 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>>>
>>>>> On Mon, 15 Apr 2024 16:56:45 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>>>> On Mon, Apr 15, 2024 at 4:50 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>>>>>
>>>>>>> On Mon, 15 Apr 2024 14:43:24 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>>>>>> On Mon, Apr 15, 2024 at 10:35 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>>>>>>>
>>>>>>>>> On Fri, 12 Apr 2024 13:49:12 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>>>>>>>> On Fri, Apr 12, 2024 at 1:39 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>>>>>>>>>> On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Now, we chain the pages of big mode by the page's private variable.
>>>>>>>>>>>>> But a subsequent patch aims to make the big mode to support
>>>>>>>>>>>>> premapped mode. This requires additional space to store the dma addr.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Within the sub-struct that contains the 'private', there is no suitable
>>>>>>>>>>>>> variable for storing the DMA addr.
>>>>>>>>>>>>>
>>>>>>>>>>>>>                  struct {        /* Page cache and anonymous pages */
>>>>>>>>>>>>>                          /**
>>>>>>>>>>>>>                           * @lru: Pageout list, eg. active_list protected by
>>>>>>>>>>>>>                           * lruvec->lru_lock.  Sometimes used as a generic list
>>>>>>>>>>>>>                           * by the page owner.
>>>>>>>>>>>>>                           */
>>>>>>>>>>>>>                          union {
>>>>>>>>>>>>>                                  struct list_head lru;
>>>>>>>>>>>>>
>>>>>>>>>>>>>                                  /* Or, for the Unevictable "LRU list" slot */
>>>>>>>>>>>>>                                  struct {
>>>>>>>>>>>>>                                          /* Always even, to negate PageTail */
>>>>>>>>>>>>>                                          void *__filler;
>>>>>>>>>>>>>                                          /* Count page's or folio's mlocks */
>>>>>>>>>>>>>                                          unsigned int mlock_count;
>>>>>>>>>>>>>                                  };
>>>>>>>>>>>>>
>>>>>>>>>>>>>                                  /* Or, free page */
>>>>>>>>>>>>>                                  struct list_head buddy_list;
>>>>>>>>>>>>>                                  struct list_head pcp_list;
>>>>>>>>>>>>>                          };
>>>>>>>>>>>>>                          /* See page-flags.h for PAGE_MAPPING_FLAGS */
>>>>>>>>>>>>>                          struct address_space *mapping;
>>>>>>>>>>>>>                          union {
>>>>>>>>>>>>>                                  pgoff_t index;          /* Our offset within mapping. */
>>>>>>>>>>>>>                                  unsigned long share;    /* share count for fsdax */
>>>>>>>>>>>>>                          };
>>>>>>>>>>>>>                          /**
>>>>>>>>>>>>>                           * @private: Mapping-private opaque data.
>>>>>>>>>>>>>                           * Usually used for buffer_heads if PagePrivate.
>>>>>>>>>>>>>                           * Used for swp_entry_t if PageSwapCache.
>>>>>>>>>>>>>                           * Indicates order in the buddy system if PageBuddy.
>>>>>>>>>>>>>                           */
>>>>>>>>>>>>>                          unsigned long private;
>>>>>>>>>>>>>                  };
>>>>>>>>>>>>>
>>>>>>>>>>>>> But within the page pool struct, we have a variable called
>>>>>>>>>>>>> dma_addr that is appropriate for storing dma addr.
>>>>>>>>>>>>> And that struct is used by netstack. That works to our advantage.
>>>>>>>>>>>>>
>>>>>>>>>>>>>                  struct {        /* page_pool used by netstack */
>>>>>>>>>>>>>                          /**
>>>>>>>>>>>>>                           * @pp_magic: magic value to avoid recycling non
>>>>>>>>>>>>>                           * page_pool allocated pages.
>>>>>>>>>>>>>                           */
>>>>>>>>>>>>>                          unsigned long pp_magic;
>>>>>>>>>>>>>                          struct page_pool *pp;
>>>>>>>>>>>>>                          unsigned long _pp_mapping_pad;
>>>>>>>>>>>>>                          unsigned long dma_addr;
>>>>>>>>>>>>>                          atomic_long_t pp_ref_count;
>>>>>>>>>>>>>                  };
>>>>>>>>>>>>>
>>>>>>>>>>>>> On the other side, we should use variables from the same sub-struct.
>>>>>>>>>>>>> So this patch replaces the "private" with "pp".
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>
>>>>>>>>>>>> Instead of doing a customized version of page pool, can we simply
>>>>>>>>>>>> switch to use page pool for big mode instead? Then we don't need to
>>>>>>>>>>>> bother the dma stuffs.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The page pool needs to do the dma by the DMA APIs.
>>>>>>>>>>> So we can not use the page pool directly.
>>>>>>>>>>
>>>>>>>>>> I found this:
>>>>>>>>>>
>>>>>>>>>> define PP_FLAG_DMA_MAP         BIT(0) /* Should page_pool do the DMA
>>>>>>>>>>                                          * map/unmap
>>>>>>>>>>
>>>>>>>>>> It seems to work here?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I have studied the page pool mechanism and believe that we cannot use it
>>>>>>>>> directly. We can make the page pool to bypass the DMA operations.
>>>>>>>>> This allows us to handle DMA within virtio-net for pages allocated from the page
>>>>>>>>> pool. Furthermore, we can utilize page pool helpers to associate the DMA address
>>>>>>>>> to the page.
>>>>>>>>>
>>>>>>>>> However, the critical issue pertains to unmapping. Ideally, we want to return
>>>>>>>>> the mapped pages to the page pool and reuse them. In doing so, we can omit the
>>>>>>>>> unmapping and remapping steps.
>>>>>>>>>
>>>>>>>>> Currently, there's a caveat: when the page pool cache is full, it disconnects
>>>>>>>>> and releases the pages. When the pool hits its capacity, pages are relinquished
>>>>>>>>> without a chance for unmapping.

Could Jakub's memory provider for PP help your use-case?

See: [1] 
https://lore.kernel.org/all/20240403002053.2376017-3-almasrymina@google.com/
Or: [2]
https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.com/T/


[...]
>>>>>>
>>>>>> Adding Jesper for some comments.
>>>>>>
>>>>>>>
>>>>>>> Back to this patch set, I think we should keep the virtio-net to manage
>>>>>>> the pages.
>>>>>>>

For context the patch:
  [3] 
https://lore.kernel.org/all/20240411025127.51945-4-xuanzhuo@linux.alibaba.com/

>>>>>>> What do you think?
>>>>>>
>>>>>> I might be wrong, but I think if we need to either
>>>>>>
>>>>>> 1) seek a way to manage the pages by yourself but not touching page
>>>>>> pool metadata (or Jesper is fine with this)
>>>>>
>>>>> Do you mean working with page pool or not?
>>>>>
>>>>
>>>> I meant if Jesper is fine with reusing page pool metadata like this patch.
>>>>
>>>>> If we manage the pages by self(no page pool), we do not care the metadata is for
>>>>> page pool or not. We just use the space of pages like the "private".
>>>>
>>>> That's also fine.
>>>>

I'm not sure it is "fine" to, explicitly choosing not to use page pool,
and then (ab)use `struct page` member (pp) that intended for page_pool
for other stuff. (In this case create a linked list of pages).

  +#define page_chain_next(p)	((struct page *)((p)->pp))
  +#define page_chain_add(p, n)	((p)->pp = (void *)n)

I'm not sure that I (as PP maintainer) can make this call actually, as I
think this area belong with the MM "page" maintainers (Cc MM-list +
people) to judge.

Just invention new ways to use struct page fields without adding your
use-case to struct page, will make it harder for MM people to maintain
(e.g. make future change).

--Jesper



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page
  2024-04-18 20:19                           ` Jesper Dangaard Brouer
@ 2024-04-18 21:56                             ` Matthew Wilcox
  2024-04-19  7:11                             ` Xuan Zhuo
  1 sibling, 0 replies; 49+ messages in thread
From: Matthew Wilcox @ 2024-04-18 21:56 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Xuan Zhuo, Jason Wang, virtualization, Michael S. Tsirkin,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev, Linux-MM, Ilias Apalodimas, Mel Gorman

On Thu, Apr 18, 2024 at 10:19:33PM +0200, Jesper Dangaard Brouer wrote:
> I'm not sure it is "fine" to, explicitly choosing not to use page pool,
> and then (ab)use `struct page` member (pp) that intended for page_pool
> for other stuff. (In this case create a linked list of pages).
> 
>  +#define page_chain_next(p)	((struct page *)((p)->pp))
>  +#define page_chain_add(p, n)	((p)->pp = (void *)n)
> 
> I'm not sure that I (as PP maintainer) can make this call actually, as I
> think this area belong with the MM "page" maintainers (Cc MM-list +
> people) to judge.
> 
> Just invention new ways to use struct page fields without adding your
> use-case to struct page, will make it harder for MM people to maintain
> (e.g. make future change).

I can't really follow what's being proposed; the quoting is quite deep.

Here's the current plan for struct page:

 - The individual users are being split off.  This has already happened
   for struct folio, struct slab and struct pgdesc.  Others are hopefully
   coming.
 - At some point, struct page will become:

   struct page {
     unsigned long flags;
     unsigned long data[5];
     unsigned int data2[2];
     ... some other bits and pieces ...
   };

 - After that, we will turn struct page into:

  struct page {
    unsigned long memdesc;
  };

Users like pagepool will allocate a struct ppdesc that will be
referred to by the memdesc.  The bottom 4 bits will identify it as a
ppdesc.  You can put anything you like in a struct ppdesc, it just has
to be allocated from a slab with a 16 byte alignment.

More details here:
https://kernelnewbies.org/MatthewWilcox/Memdescs

This is all likely to land in 2025.  The goal for 2024 is to remove
mapping & index from 'struct page'.  This has been in progress since
2019 so I'm really excited that we're so close!  If you want to
turn struct ppdesc into its own struct like folio, slab & ptdesc,
I'm happy to help.  I once had a patchset for that:

https://lore.kernel.org/netdev/20221130220803.3657490-1-willy@infradead.org/

but I'm sure it's truly bitrotted.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 4/6] virtio_net: big mode support premapped
  2024-04-18  8:29     ` Xuan Zhuo
@ 2024-04-19  0:43       ` Jason Wang
  2024-04-19  4:21         ` Xuan Zhuo
  0 siblings, 1 reply; 49+ messages in thread
From: Jason Wang @ 2024-04-19  0:43 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Thu, Apr 18, 2024 at 4:35 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Thu, 18 Apr 2024 14:25:06 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > used, we can reuse them without needing to unmap and remap.
> > >
> > > We require space to store the DMA address. I use the page.dma_addr to
> > > store the DMA address from the pp structure inside the page.
> > >
> > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > we remap it before returning it to the chain.
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >  drivers/net/virtio_net.c | 98 +++++++++++++++++++++++++++++++++-------
> > >  1 file changed, 81 insertions(+), 17 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 4446fb54de6d..7ea7e9bcd5d7 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -50,6 +50,7 @@ module_param(napi_tx, bool, 0644);
> > >
> > >  #define page_chain_next(p)     ((struct page *)((p)->pp))
> > >  #define page_chain_add(p, n)   ((p)->pp = (void *)n)
> > > +#define page_dma_addr(p)       ((p)->dma_addr)
> > >
> > >  /* RX packet size EWMA. The average packet size is used to determine the packet
> > >   * buffer size when refilling RX rings. As the entire RX ring may be refilled
> > > @@ -434,6 +435,46 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > >         return (struct virtio_net_common_hdr *)skb->cb;
> > >  }
> > >
> > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > +{
> > > +       sg->dma_address = addr;
> > > +       sg->length = len;
> > > +}
> > > +
> > > +static void page_chain_unmap(struct receive_queue *rq, struct page *p)
> > > +{
> > > +       virtqueue_dma_unmap_page_attrs(rq->vq, page_dma_addr(p), PAGE_SIZE,
> > > +                                      DMA_FROM_DEVICE, 0);
> > > +
> > > +       page_dma_addr(p) = DMA_MAPPING_ERROR;
> > > +}
> > > +
> > > +static int page_chain_map(struct receive_queue *rq, struct page *p)
> > > +{
> > > +       dma_addr_t addr;
> > > +
> > > +       addr = virtqueue_dma_map_page_attrs(rq->vq, p, 0, PAGE_SIZE, DMA_FROM_DEVICE, 0);
> > > +       if (virtqueue_dma_mapping_error(rq->vq, addr))
> > > +               return -ENOMEM;
> > > +
> > > +       page_dma_addr(p) = addr;
> > > +       return 0;
> > > +}
> > > +
> > > +static void page_chain_release(struct receive_queue *rq)
> > > +{
> > > +       struct page *p, *n;
> > > +
> > > +       for (p = rq->pages; p; p = n) {
> > > +               n = page_chain_next(p);
> > > +
> > > +               page_chain_unmap(rq, p);
> > > +               __free_pages(p, 0);
> > > +       }
> > > +
> > > +       rq->pages = NULL;
> > > +}
> > > +
> > >  /*
> > >   * put the whole most recent used list in the beginning for reuse
> > >   */
> > > @@ -441,6 +482,13 @@ static void give_pages(struct receive_queue *rq, struct page *page)
> > >  {
> > >         struct page *end;
> > >
> > > +       if (page_dma_addr(page) == DMA_MAPPING_ERROR) {
> >
> > This looks strange, the map should be done during allocation. Under
> > which condition could we hit this?
>
> This first page is umapped before we call page_to_skb().
> The page can be put back to the link in case of failure.

See below.

>
>
> >
> > > +               if (page_chain_map(rq, page)) {
> > > +                       __free_pages(page, 0);
> > > +                       return;
> > > +               }
> > > +       }
> > > +
> > >         /* Find end of list, sew whole thing into vi->rq.pages. */
> > >         for (end = page; page_chain_next(end); end = page_chain_next(end));
> > >
> > > @@ -456,8 +504,15 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
> > >                 rq->pages = page_chain_next(p);
> > >                 /* clear chain here, it is used to chain pages */
> > >                 page_chain_add(p, NULL);
> > > -       } else
> > > +       } else {
> > >                 p = alloc_page(gfp_mask);
> > > +
> > > +               if (page_chain_map(rq, p)) {
> > > +                       __free_pages(p, 0);
> > > +                       return NULL;
> > > +               }
> > > +       }
> > > +
> > >         return p;
> > >  }
> > >
> > > @@ -613,8 +668,6 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > >                         return NULL;
> > >
> > >                 page = page_chain_next(page);
> > > -               if (page)
> > > -                       give_pages(rq, page);
> > >                 goto ok;
> > >         }
> > >
> > > @@ -640,6 +693,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > >                         skb_add_rx_frag(skb, 0, page, offset, len, truesize);
> > >                 else
> > >                         page_to_free = page;
> > > +               page = NULL;
> > >                 goto ok;
> > >         }
> > >
> > > @@ -657,6 +711,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > >         BUG_ON(offset >= PAGE_SIZE);
> > >         while (len) {
> > >                 unsigned int frag_size = min((unsigned)PAGE_SIZE - offset, len);
> > > +
> > > +               /* unmap the page before using it. */
> > > +               if (!offset)
> > > +                       page_chain_unmap(rq, page);
> > > +
> >
> > This sounds strange, do we need a virtqueue_sync_for_cpu() helper here?
>
> I think we do not need that. Because the umap api does it.
> We do not work with DMA_SKIP_SYNC;

Well, the problem is unmap is too heavyweight and it reduces the
effort of trying to avoid map/umaps as much as possible.

For example, for most of the case DMA sync is just a nop. And such
unmap() cause strange code in give_pages() as we discuss above?

Thanks


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 5/6] virtio_net: enable premapped by default
  2024-04-18  8:35     ` Xuan Zhuo
@ 2024-04-19  0:44       ` Jason Wang
  0 siblings, 0 replies; 49+ messages in thread
From: Jason Wang @ 2024-04-19  0:44 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Thu, Apr 18, 2024 at 4:37 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Thu, 18 Apr 2024 14:26:33 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > Currently, big, merge, and small modes all support the premapped mode.
> > > We can now enable premapped mode by default. Furthermore,
> > > virtqueue_set_dma_premapped() must succeed when called immediately after
> > > find_vqs(). Consequently, we can assume that premapped mode is always
> > > enabled.
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >  drivers/net/virtio_net.c | 12 +++++-------
> > >  1 file changed, 5 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 7ea7e9bcd5d7..f0faf7c0fe59 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -860,15 +860,13 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> > >
> > >  static void virtnet_rq_set_premapped(struct virtnet_info *vi)
> > >  {
> > > -       int i;
> > > -
> > > -       /* disable for big mode */
> > > -       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > -               return;
> > > +       int i, err;
> > >
> > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > -               if (virtqueue_set_dma_premapped(vi->rq[i].vq))
> > > -                       continue;
> > > +               err = virtqueue_set_dma_premapped(vi->rq[i].vq);
> > > +
> > > +               /* never happen */
> > > +               BUG_ON(err);
> >
> > Nit:
> >
> > Maybe just a BUG_ON(virtqueue_set_dma_premapped()).
>
> OK
>
>
> >
> > Btw, if there's no way to disable pre mapping, maybe it's better to
> > rename virtqueue_set_dma_premapped() to
> > virtqueue_enable_dma_premapped(ing).
>
> This patch will add a way to disable pre mapping.
>
>         https://lore.kernel.org/all/20240327111430.108787-11-xuanzhuo@linux.alibaba.com/
>
> Thanks.

Ok, fine.

Thanks

>
>
> >
> > Thanks
> >
> > >
> > >                 vi->rq[i].do_dma = true;
> > >         }
> > > --
> > > 2.32.0.3.g01195cf9f
> > >
> >
>


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 4/6] virtio_net: big mode support premapped
  2024-04-19  0:43       ` Jason Wang
@ 2024-04-19  4:21         ` Xuan Zhuo
  2024-04-19  5:46           ` Jason Wang
  0 siblings, 1 reply; 49+ messages in thread
From: Xuan Zhuo @ 2024-04-19  4:21 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Fri, 19 Apr 2024 08:43:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Thu, Apr 18, 2024 at 4:35 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Thu, 18 Apr 2024 14:25:06 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > used, we can reuse them without needing to unmap and remap.
> > > >
> > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > store the DMA address from the pp structure inside the page.
> > > >
> > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > we remap it before returning it to the chain.
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > >  drivers/net/virtio_net.c | 98 +++++++++++++++++++++++++++++++++-------
> > > >  1 file changed, 81 insertions(+), 17 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 4446fb54de6d..7ea7e9bcd5d7 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -50,6 +50,7 @@ module_param(napi_tx, bool, 0644);
> > > >
> > > >  #define page_chain_next(p)     ((struct page *)((p)->pp))
> > > >  #define page_chain_add(p, n)   ((p)->pp = (void *)n)
> > > > +#define page_dma_addr(p)       ((p)->dma_addr)
> > > >
> > > >  /* RX packet size EWMA. The average packet size is used to determine the packet
> > > >   * buffer size when refilling RX rings. As the entire RX ring may be refilled
> > > > @@ -434,6 +435,46 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > >  }
> > > >
> > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > +{
> > > > +       sg->dma_address = addr;
> > > > +       sg->length = len;
> > > > +}
> > > > +
> > > > +static void page_chain_unmap(struct receive_queue *rq, struct page *p)
> > > > +{
> > > > +       virtqueue_dma_unmap_page_attrs(rq->vq, page_dma_addr(p), PAGE_SIZE,
> > > > +                                      DMA_FROM_DEVICE, 0);
> > > > +
> > > > +       page_dma_addr(p) = DMA_MAPPING_ERROR;
> > > > +}
> > > > +
> > > > +static int page_chain_map(struct receive_queue *rq, struct page *p)
> > > > +{
> > > > +       dma_addr_t addr;
> > > > +
> > > > +       addr = virtqueue_dma_map_page_attrs(rq->vq, p, 0, PAGE_SIZE, DMA_FROM_DEVICE, 0);
> > > > +       if (virtqueue_dma_mapping_error(rq->vq, addr))
> > > > +               return -ENOMEM;
> > > > +
> > > > +       page_dma_addr(p) = addr;
> > > > +       return 0;
> > > > +}
> > > > +
> > > > +static void page_chain_release(struct receive_queue *rq)
> > > > +{
> > > > +       struct page *p, *n;
> > > > +
> > > > +       for (p = rq->pages; p; p = n) {
> > > > +               n = page_chain_next(p);
> > > > +
> > > > +               page_chain_unmap(rq, p);
> > > > +               __free_pages(p, 0);
> > > > +       }
> > > > +
> > > > +       rq->pages = NULL;
> > > > +}
> > > > +
> > > >  /*
> > > >   * put the whole most recent used list in the beginning for reuse
> > > >   */
> > > > @@ -441,6 +482,13 @@ static void give_pages(struct receive_queue *rq, struct page *page)
> > > >  {
> > > >         struct page *end;
> > > >
> > > > +       if (page_dma_addr(page) == DMA_MAPPING_ERROR) {
> > >
> > > This looks strange, the map should be done during allocation. Under
> > > which condition could we hit this?
> >
> > This first page is umapped before we call page_to_skb().
> > The page can be put back to the link in case of failure.
>
> See below.
>
> >
> >
> > >
> > > > +               if (page_chain_map(rq, page)) {
> > > > +                       __free_pages(page, 0);
> > > > +                       return;
> > > > +               }
> > > > +       }
> > > > +
> > > >         /* Find end of list, sew whole thing into vi->rq.pages. */
> > > >         for (end = page; page_chain_next(end); end = page_chain_next(end));
> > > >
> > > > @@ -456,8 +504,15 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
> > > >                 rq->pages = page_chain_next(p);
> > > >                 /* clear chain here, it is used to chain pages */
> > > >                 page_chain_add(p, NULL);
> > > > -       } else
> > > > +       } else {
> > > >                 p = alloc_page(gfp_mask);
> > > > +
> > > > +               if (page_chain_map(rq, p)) {
> > > > +                       __free_pages(p, 0);
> > > > +                       return NULL;
> > > > +               }
> > > > +       }
> > > > +
> > > >         return p;
> > > >  }
> > > >
> > > > @@ -613,8 +668,6 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > >                         return NULL;
> > > >
> > > >                 page = page_chain_next(page);
> > > > -               if (page)
> > > > -                       give_pages(rq, page);
> > > >                 goto ok;
> > > >         }
> > > >
> > > > @@ -640,6 +693,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > >                         skb_add_rx_frag(skb, 0, page, offset, len, truesize);
> > > >                 else
> > > >                         page_to_free = page;
> > > > +               page = NULL;
> > > >                 goto ok;
> > > >         }
> > > >
> > > > @@ -657,6 +711,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > >         BUG_ON(offset >= PAGE_SIZE);
> > > >         while (len) {
> > > >                 unsigned int frag_size = min((unsigned)PAGE_SIZE - offset, len);
> > > > +
> > > > +               /* unmap the page before using it. */
> > > > +               if (!offset)
> > > > +                       page_chain_unmap(rq, page);
> > > > +
> > >
> > > This sounds strange, do we need a virtqueue_sync_for_cpu() helper here?
> >
> > I think we do not need that. Because the umap api does it.
> > We do not work with DMA_SKIP_SYNC;
>
> Well, the problem is unmap is too heavyweight and it reduces the
> effort of trying to avoid map/umaps as much as possible.
>
> For example, for most of the case DMA sync is just a nop. And such
> unmap() cause strange code in give_pages() as we discuss above?

YES. You are right. For the first page, we just need to sync for cpu.
And we do not need to check the dma status.
But here (in page_to_skb), we need to call unmap, because this page is put
to the skb.

Thanks.


>
> Thanks
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 4/6] virtio_net: big mode support premapped
  2024-04-19  4:21         ` Xuan Zhuo
@ 2024-04-19  5:46           ` Jason Wang
  2024-04-19  7:03             ` Xuan Zhuo
  0 siblings, 1 reply; 49+ messages in thread
From: Jason Wang @ 2024-04-19  5:46 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Fri, Apr 19, 2024 at 12:23 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Fri, 19 Apr 2024 08:43:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Thu, Apr 18, 2024 at 4:35 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Thu, 18 Apr 2024 14:25:06 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > > used, we can reuse them without needing to unmap and remap.
> > > > >
> > > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > > store the DMA address from the pp structure inside the page.
> > > > >
> > > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > > we remap it before returning it to the chain.
> > > > >
> > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > ---
> > > > >  drivers/net/virtio_net.c | 98 +++++++++++++++++++++++++++++++++-------
> > > > >  1 file changed, 81 insertions(+), 17 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index 4446fb54de6d..7ea7e9bcd5d7 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -50,6 +50,7 @@ module_param(napi_tx, bool, 0644);
> > > > >
> > > > >  #define page_chain_next(p)     ((struct page *)((p)->pp))
> > > > >  #define page_chain_add(p, n)   ((p)->pp = (void *)n)
> > > > > +#define page_dma_addr(p)       ((p)->dma_addr)
> > > > >
> > > > >  /* RX packet size EWMA. The average packet size is used to determine the packet
> > > > >   * buffer size when refilling RX rings. As the entire RX ring may be refilled
> > > > > @@ -434,6 +435,46 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > > >  }
> > > > >
> > > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > > +{
> > > > > +       sg->dma_address = addr;
> > > > > +       sg->length = len;
> > > > > +}
> > > > > +
> > > > > +static void page_chain_unmap(struct receive_queue *rq, struct page *p)
> > > > > +{
> > > > > +       virtqueue_dma_unmap_page_attrs(rq->vq, page_dma_addr(p), PAGE_SIZE,
> > > > > +                                      DMA_FROM_DEVICE, 0);
> > > > > +
> > > > > +       page_dma_addr(p) = DMA_MAPPING_ERROR;
> > > > > +}
> > > > > +
> > > > > +static int page_chain_map(struct receive_queue *rq, struct page *p)
> > > > > +{
> > > > > +       dma_addr_t addr;
> > > > > +
> > > > > +       addr = virtqueue_dma_map_page_attrs(rq->vq, p, 0, PAGE_SIZE, DMA_FROM_DEVICE, 0);
> > > > > +       if (virtqueue_dma_mapping_error(rq->vq, addr))
> > > > > +               return -ENOMEM;
> > > > > +
> > > > > +       page_dma_addr(p) = addr;
> > > > > +       return 0;
> > > > > +}
> > > > > +
> > > > > +static void page_chain_release(struct receive_queue *rq)
> > > > > +{
> > > > > +       struct page *p, *n;
> > > > > +
> > > > > +       for (p = rq->pages; p; p = n) {
> > > > > +               n = page_chain_next(p);
> > > > > +
> > > > > +               page_chain_unmap(rq, p);
> > > > > +               __free_pages(p, 0);
> > > > > +       }
> > > > > +
> > > > > +       rq->pages = NULL;
> > > > > +}
> > > > > +
> > > > >  /*
> > > > >   * put the whole most recent used list in the beginning for reuse
> > > > >   */
> > > > > @@ -441,6 +482,13 @@ static void give_pages(struct receive_queue *rq, struct page *page)
> > > > >  {
> > > > >         struct page *end;
> > > > >
> > > > > +       if (page_dma_addr(page) == DMA_MAPPING_ERROR) {
> > > >
> > > > This looks strange, the map should be done during allocation. Under
> > > > which condition could we hit this?
> > >
> > > This first page is umapped before we call page_to_skb().
> > > The page can be put back to the link in case of failure.
> >
> > See below.
> >
> > >
> > >
> > > >
> > > > > +               if (page_chain_map(rq, page)) {
> > > > > +                       __free_pages(page, 0);
> > > > > +                       return;
> > > > > +               }
> > > > > +       }
> > > > > +
> > > > >         /* Find end of list, sew whole thing into vi->rq.pages. */
> > > > >         for (end = page; page_chain_next(end); end = page_chain_next(end));
> > > > >
> > > > > @@ -456,8 +504,15 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
> > > > >                 rq->pages = page_chain_next(p);
> > > > >                 /* clear chain here, it is used to chain pages */
> > > > >                 page_chain_add(p, NULL);
> > > > > -       } else
> > > > > +       } else {
> > > > >                 p = alloc_page(gfp_mask);
> > > > > +
> > > > > +               if (page_chain_map(rq, p)) {
> > > > > +                       __free_pages(p, 0);
> > > > > +                       return NULL;
> > > > > +               }
> > > > > +       }
> > > > > +
> > > > >         return p;
> > > > >  }
> > > > >
> > > > > @@ -613,8 +668,6 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > >                         return NULL;
> > > > >
> > > > >                 page = page_chain_next(page);
> > > > > -               if (page)
> > > > > -                       give_pages(rq, page);
> > > > >                 goto ok;
> > > > >         }
> > > > >
> > > > > @@ -640,6 +693,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > >                         skb_add_rx_frag(skb, 0, page, offset, len, truesize);
> > > > >                 else
> > > > >                         page_to_free = page;
> > > > > +               page = NULL;
> > > > >                 goto ok;
> > > > >         }
> > > > >
> > > > > @@ -657,6 +711,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > >         BUG_ON(offset >= PAGE_SIZE);
> > > > >         while (len) {
> > > > >                 unsigned int frag_size = min((unsigned)PAGE_SIZE - offset, len);
> > > > > +
> > > > > +               /* unmap the page before using it. */
> > > > > +               if (!offset)
> > > > > +                       page_chain_unmap(rq, page);
> > > > > +
> > > >
> > > > This sounds strange, do we need a virtqueue_sync_for_cpu() helper here?
> > >
> > > I think we do not need that. Because the umap api does it.
> > > We do not work with DMA_SKIP_SYNC;
> >
> > Well, the problem is unmap is too heavyweight and it reduces the
> > effort of trying to avoid map/umaps as much as possible.
> >
> > For example, for most of the case DMA sync is just a nop. And such
> > unmap() cause strange code in give_pages() as we discuss above?
>
> YES. You are right. For the first page, we just need to sync for cpu.
> And we do not need to check the dma status.
> But here (in page_to_skb), we need to call unmap, because this page is put
> to the skb.

Right, but issue still,

The only case that we may hit

        if (page_dma_addr(page) == DMA_MAPPING_ERROR)

is when the packet is smaller than GOOD_COPY_LEN.

So if we sync_for_cpu for the head page, we don't do:

1) unmap in the receive_big()
2) do snyc_for_cpu() just before skb_put_data(), so the page could be
recycled to the pool without unmapping?

And I think we should do something similar for the mergeable case?

Btw, I found one the misleading comment introduced by f80bd740cb7c9

        /* copy small packet so we can reuse these pages */
        if (!NET_IP_ALIGN && len > GOOD_COPY_LEN && tailroom >= shinfo_size) {

We're not copying but building skb around the head page.

Thanks

>
> Thanks.
>
>
> >
> > Thanks
> >
>


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 4/6] virtio_net: big mode support premapped
  2024-04-19  5:46           ` Jason Wang
@ 2024-04-19  7:03             ` Xuan Zhuo
  2024-04-19  7:24               ` Jason Wang
  0 siblings, 1 reply; 49+ messages in thread
From: Xuan Zhuo @ 2024-04-19  7:03 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Fri, 19 Apr 2024 13:46:25 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Fri, Apr 19, 2024 at 12:23 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Fri, 19 Apr 2024 08:43:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Thu, Apr 18, 2024 at 4:35 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Thu, 18 Apr 2024 14:25:06 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > > > used, we can reuse them without needing to unmap and remap.
> > > > > >
> > > > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > > > store the DMA address from the pp structure inside the page.
> > > > > >
> > > > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > > > we remap it before returning it to the chain.
> > > > > >
> > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > ---
> > > > > >  drivers/net/virtio_net.c | 98 +++++++++++++++++++++++++++++++++-------
> > > > > >  1 file changed, 81 insertions(+), 17 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > index 4446fb54de6d..7ea7e9bcd5d7 100644
> > > > > > --- a/drivers/net/virtio_net.c
> > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > @@ -50,6 +50,7 @@ module_param(napi_tx, bool, 0644);
> > > > > >
> > > > > >  #define page_chain_next(p)     ((struct page *)((p)->pp))
> > > > > >  #define page_chain_add(p, n)   ((p)->pp = (void *)n)
> > > > > > +#define page_dma_addr(p)       ((p)->dma_addr)
> > > > > >
> > > > > >  /* RX packet size EWMA. The average packet size is used to determine the packet
> > > > > >   * buffer size when refilling RX rings. As the entire RX ring may be refilled
> > > > > > @@ -434,6 +435,46 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > > > >  }
> > > > > >
> > > > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > > > +{
> > > > > > +       sg->dma_address = addr;
> > > > > > +       sg->length = len;
> > > > > > +}
> > > > > > +
> > > > > > +static void page_chain_unmap(struct receive_queue *rq, struct page *p)
> > > > > > +{
> > > > > > +       virtqueue_dma_unmap_page_attrs(rq->vq, page_dma_addr(p), PAGE_SIZE,
> > > > > > +                                      DMA_FROM_DEVICE, 0);
> > > > > > +
> > > > > > +       page_dma_addr(p) = DMA_MAPPING_ERROR;
> > > > > > +}
> > > > > > +
> > > > > > +static int page_chain_map(struct receive_queue *rq, struct page *p)
> > > > > > +{
> > > > > > +       dma_addr_t addr;
> > > > > > +
> > > > > > +       addr = virtqueue_dma_map_page_attrs(rq->vq, p, 0, PAGE_SIZE, DMA_FROM_DEVICE, 0);
> > > > > > +       if (virtqueue_dma_mapping_error(rq->vq, addr))
> > > > > > +               return -ENOMEM;
> > > > > > +
> > > > > > +       page_dma_addr(p) = addr;
> > > > > > +       return 0;
> > > > > > +}
> > > > > > +
> > > > > > +static void page_chain_release(struct receive_queue *rq)
> > > > > > +{
> > > > > > +       struct page *p, *n;
> > > > > > +
> > > > > > +       for (p = rq->pages; p; p = n) {
> > > > > > +               n = page_chain_next(p);
> > > > > > +
> > > > > > +               page_chain_unmap(rq, p);
> > > > > > +               __free_pages(p, 0);
> > > > > > +       }
> > > > > > +
> > > > > > +       rq->pages = NULL;
> > > > > > +}
> > > > > > +
> > > > > >  /*
> > > > > >   * put the whole most recent used list in the beginning for reuse
> > > > > >   */
> > > > > > @@ -441,6 +482,13 @@ static void give_pages(struct receive_queue *rq, struct page *page)
> > > > > >  {
> > > > > >         struct page *end;
> > > > > >
> > > > > > +       if (page_dma_addr(page) == DMA_MAPPING_ERROR) {
> > > > >
> > > > > This looks strange, the map should be done during allocation. Under
> > > > > which condition could we hit this?
> > > >
> > > > This first page is umapped before we call page_to_skb().
> > > > The page can be put back to the link in case of failure.
> > >
> > > See below.
> > >
> > > >
> > > >
> > > > >
> > > > > > +               if (page_chain_map(rq, page)) {
> > > > > > +                       __free_pages(page, 0);
> > > > > > +                       return;
> > > > > > +               }
> > > > > > +       }
> > > > > > +
> > > > > >         /* Find end of list, sew whole thing into vi->rq.pages. */
> > > > > >         for (end = page; page_chain_next(end); end = page_chain_next(end));
> > > > > >
> > > > > > @@ -456,8 +504,15 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
> > > > > >                 rq->pages = page_chain_next(p);
> > > > > >                 /* clear chain here, it is used to chain pages */
> > > > > >                 page_chain_add(p, NULL);
> > > > > > -       } else
> > > > > > +       } else {
> > > > > >                 p = alloc_page(gfp_mask);
> > > > > > +
> > > > > > +               if (page_chain_map(rq, p)) {
> > > > > > +                       __free_pages(p, 0);
> > > > > > +                       return NULL;
> > > > > > +               }
> > > > > > +       }
> > > > > > +
> > > > > >         return p;
> > > > > >  }
> > > > > >
> > > > > > @@ -613,8 +668,6 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > >                         return NULL;
> > > > > >
> > > > > >                 page = page_chain_next(page);
> > > > > > -               if (page)
> > > > > > -                       give_pages(rq, page);
> > > > > >                 goto ok;
> > > > > >         }
> > > > > >
> > > > > > @@ -640,6 +693,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > >                         skb_add_rx_frag(skb, 0, page, offset, len, truesize);
> > > > > >                 else
> > > > > >                         page_to_free = page;
> > > > > > +               page = NULL;
> > > > > >                 goto ok;
> > > > > >         }
> > > > > >
> > > > > > @@ -657,6 +711,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > >         BUG_ON(offset >= PAGE_SIZE);
> > > > > >         while (len) {
> > > > > >                 unsigned int frag_size = min((unsigned)PAGE_SIZE - offset, len);
> > > > > > +
> > > > > > +               /* unmap the page before using it. */
> > > > > > +               if (!offset)
> > > > > > +                       page_chain_unmap(rq, page);
> > > > > > +
> > > > >
> > > > > This sounds strange, do we need a virtqueue_sync_for_cpu() helper here?
> > > >
> > > > I think we do not need that. Because the umap api does it.
> > > > We do not work with DMA_SKIP_SYNC;
> > >
> > > Well, the problem is unmap is too heavyweight and it reduces the
> > > effort of trying to avoid map/umaps as much as possible.
> > >
> > > For example, for most of the case DMA sync is just a nop. And such
> > > unmap() cause strange code in give_pages() as we discuss above?
> >
> > YES. You are right. For the first page, we just need to sync for cpu.
> > And we do not need to check the dma status.
> > But here (in page_to_skb), we need to call unmap, because this page is put
> > to the skb.
>
> Right, but issue still,
>
> The only case that we may hit
>
>         if (page_dma_addr(page) == DMA_MAPPING_ERROR)
>
> is when the packet is smaller than GOOD_COPY_LEN.
>
> So if we sync_for_cpu for the head page, we don't do:
>
> 1) unmap in the receive_big()
> 2) do snyc_for_cpu() just before skb_put_data(), so the page could be
> recycled to the pool without unmapping?


I do not get.

I think we can remove the code "if (page_dma_addr(page) == DMA_MAPPING_ERROR)"
from give_pages(). We just do unmap when the page is leaving virtio-net.

>
> And I think we should do something similar for the mergeable case?

Do what?

We have used the sync api for mergeable case.


>
> Btw, I found one the misleading comment introduced by f80bd740cb7c9
>
>         /* copy small packet so we can reuse these pages */
>         if (!NET_IP_ALIGN && len > GOOD_COPY_LEN && tailroom >= shinfo_size) {
>
> We're not copying but building skb around the head page.

Will fix.

Thanks.


>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > Thanks
> > >
> >
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page
  2024-04-18 20:19                           ` Jesper Dangaard Brouer
  2024-04-18 21:56                             ` Matthew Wilcox
@ 2024-04-19  7:11                             ` Xuan Zhuo
  1 sibling, 0 replies; 49+ messages in thread
From: Xuan Zhuo @ 2024-04-19  7:11 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, Linux-MM,
	Matthew Wilcox, Ilias Apalodimas, Mel Gorman, Jason Wang

On Thu, 18 Apr 2024 22:19:33 +0200, Jesper Dangaard Brouer <hawk@kernel.org> wrote:
>
>
> On 17/04/2024 10.20, Xuan Zhuo wrote:
> > On Wed, 17 Apr 2024 12:08:10 +0800, Jason Wang <jasowang@redhat.com> wrote:
> >> On Wed, Apr 17, 2024 at 9:38 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >>>
> >>> On Tue, 16 Apr 2024 11:24:53 +0800, Jason Wang <jasowang@redhat.com> wrote:
> >>>> On Mon, Apr 15, 2024 at 5:04 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >>>>>
> >>>>> On Mon, 15 Apr 2024 16:56:45 +0800, Jason Wang <jasowang@redhat.com> wrote:
> >>>>>> On Mon, Apr 15, 2024 at 4:50 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >>>>>>>
> >>>>>>> On Mon, 15 Apr 2024 14:43:24 +0800, Jason Wang <jasowang@redhat.com> wrote:
> >>>>>>>> On Mon, Apr 15, 2024 at 10:35 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >>>>>>>>>
> >>>>>>>>> On Fri, 12 Apr 2024 13:49:12 +0800, Jason Wang <jasowang@redhat.com> wrote:
> >>>>>>>>>> On Fri, Apr 12, 2024 at 1:39 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang <jasowang@redhat.com> wrote:
> >>>>>>>>>>>> On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Now, we chain the pages of big mode by the page's private variable.
> >>>>>>>>>>>>> But a subsequent patch aims to make the big mode to support
> >>>>>>>>>>>>> premapped mode. This requires additional space to store the dma addr.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Within the sub-struct that contains the 'private', there is no suitable
> >>>>>>>>>>>>> variable for storing the DMA addr.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>                  struct {        /* Page cache and anonymous pages */
> >>>>>>>>>>>>>                          /**
> >>>>>>>>>>>>>                           * @lru: Pageout list, eg. active_list protected by
> >>>>>>>>>>>>>                           * lruvec->lru_lock.  Sometimes used as a generic list
> >>>>>>>>>>>>>                           * by the page owner.
> >>>>>>>>>>>>>                           */
> >>>>>>>>>>>>>                          union {
> >>>>>>>>>>>>>                                  struct list_head lru;
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>                                  /* Or, for the Unevictable "LRU list" slot */
> >>>>>>>>>>>>>                                  struct {
> >>>>>>>>>>>>>                                          /* Always even, to negate PageTail */
> >>>>>>>>>>>>>                                          void *__filler;
> >>>>>>>>>>>>>                                          /* Count page's or folio's mlocks */
> >>>>>>>>>>>>>                                          unsigned int mlock_count;
> >>>>>>>>>>>>>                                  };
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>                                  /* Or, free page */
> >>>>>>>>>>>>>                                  struct list_head buddy_list;
> >>>>>>>>>>>>>                                  struct list_head pcp_list;
> >>>>>>>>>>>>>                          };
> >>>>>>>>>>>>>                          /* See page-flags.h for PAGE_MAPPING_FLAGS */
> >>>>>>>>>>>>>                          struct address_space *mapping;
> >>>>>>>>>>>>>                          union {
> >>>>>>>>>>>>>                                  pgoff_t index;          /* Our offset within mapping. */
> >>>>>>>>>>>>>                                  unsigned long share;    /* share count for fsdax */
> >>>>>>>>>>>>>                          };
> >>>>>>>>>>>>>                          /**
> >>>>>>>>>>>>>                           * @private: Mapping-private opaque data.
> >>>>>>>>>>>>>                           * Usually used for buffer_heads if PagePrivate.
> >>>>>>>>>>>>>                           * Used for swp_entry_t if PageSwapCache.
> >>>>>>>>>>>>>                           * Indicates order in the buddy system if PageBuddy.
> >>>>>>>>>>>>>                           */
> >>>>>>>>>>>>>                          unsigned long private;
> >>>>>>>>>>>>>                  };
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> But within the page pool struct, we have a variable called
> >>>>>>>>>>>>> dma_addr that is appropriate for storing dma addr.
> >>>>>>>>>>>>> And that struct is used by netstack. That works to our advantage.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>                  struct {        /* page_pool used by netstack */
> >>>>>>>>>>>>>                          /**
> >>>>>>>>>>>>>                           * @pp_magic: magic value to avoid recycling non
> >>>>>>>>>>>>>                           * page_pool allocated pages.
> >>>>>>>>>>>>>                           */
> >>>>>>>>>>>>>                          unsigned long pp_magic;
> >>>>>>>>>>>>>                          struct page_pool *pp;
> >>>>>>>>>>>>>                          unsigned long _pp_mapping_pad;
> >>>>>>>>>>>>>                          unsigned long dma_addr;
> >>>>>>>>>>>>>                          atomic_long_t pp_ref_count;
> >>>>>>>>>>>>>                  };
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On the other side, we should use variables from the same sub-struct.
> >>>>>>>>>>>>> So this patch replaces the "private" with "pp".
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> >>>>>>>>>>>>> ---
> >>>>>>>>>>>>
> >>>>>>>>>>>> Instead of doing a customized version of page pool, can we simply
> >>>>>>>>>>>> switch to use page pool for big mode instead? Then we don't need to
> >>>>>>>>>>>> bother the dma stuffs.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> The page pool needs to do the dma by the DMA APIs.
> >>>>>>>>>>> So we can not use the page pool directly.
> >>>>>>>>>>
> >>>>>>>>>> I found this:
> >>>>>>>>>>
> >>>>>>>>>> define PP_FLAG_DMA_MAP         BIT(0) /* Should page_pool do the DMA
> >>>>>>>>>>                                          * map/unmap
> >>>>>>>>>>
> >>>>>>>>>> It seems to work here?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I have studied the page pool mechanism and believe that we cannot use it
> >>>>>>>>> directly. We can make the page pool to bypass the DMA operations.
> >>>>>>>>> This allows us to handle DMA within virtio-net for pages allocated from the page
> >>>>>>>>> pool. Furthermore, we can utilize page pool helpers to associate the DMA address
> >>>>>>>>> to the page.
> >>>>>>>>>
> >>>>>>>>> However, the critical issue pertains to unmapping. Ideally, we want to return
> >>>>>>>>> the mapped pages to the page pool and reuse them. In doing so, we can omit the
> >>>>>>>>> unmapping and remapping steps.
> >>>>>>>>>
> >>>>>>>>> Currently, there's a caveat: when the page pool cache is full, it disconnects
> >>>>>>>>> and releases the pages. When the pool hits its capacity, pages are relinquished
> >>>>>>>>> without a chance for unmapping.
>
> Could Jakub's memory provider for PP help your use-case?
>
> See: [1]
> https://lore.kernel.org/all/20240403002053.2376017-3-almasrymina@google.com/
> Or: [2]
> https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.com/T/


It can not. That make the pp can get page by the callbacks.

Here we talk about the map/unmap.

The virtio-net has the different DMA APIs.

	dma_addr_t virtqueue_dma_map_single_attrs(struct virtqueue *_vq, void *ptr, size_t size,
						  enum dma_data_direction dir, unsigned long attrs);
	void virtqueue_dma_unmap_single_attrs(struct virtqueue *_vq, dma_addr_t addr,
					      size_t size, enum dma_data_direction dir,
					      unsigned long attrs);
	dma_addr_t virtqueue_dma_map_page_attrs(struct virtqueue *_vq, struct page *page,
						size_t offset, size_t size,
						enum dma_data_direction dir,
						unsigned long attrs);
	void virtqueue_dma_unmap_page_attrs(struct virtqueue *_vq, dma_addr_t addr,
					    size_t size, enum dma_data_direction dir,
					    unsigned long attrs);
	int virtqueue_dma_mapping_error(struct virtqueue *_vq, dma_addr_t addr);

	bool virtqueue_dma_need_sync(struct virtqueue *_vq, dma_addr_t addr);
	void virtqueue_dma_sync_single_range_for_cpu(struct virtqueue *_vq, dma_addr_t addr,
						     unsigned long offset, size_t size,
						     enum dma_data_direction dir);
	void virtqueue_dma_sync_single_range_for_device(struct virtqueue *_vq, dma_addr_t addr,
							unsigned long offset, size_t size,
							enum dma_data_direction dir);


Thanks.

>
>
> [...]
> >>>>>>
> >>>>>> Adding Jesper for some comments.
> >>>>>>
> >>>>>>>
> >>>>>>> Back to this patch set, I think we should keep the virtio-net to manage
> >>>>>>> the pages.
> >>>>>>>
>
> For context the patch:
>   [3]
> https://lore.kernel.org/all/20240411025127.51945-4-xuanzhuo@linux.alibaba.com/
>
> >>>>>>> What do you think?
> >>>>>>
> >>>>>> I might be wrong, but I think if we need to either
> >>>>>>
> >>>>>> 1) seek a way to manage the pages by yourself but not touching page
> >>>>>> pool metadata (or Jesper is fine with this)
> >>>>>
> >>>>> Do you mean working with page pool or not?
> >>>>>
> >>>>
> >>>> I meant if Jesper is fine with reusing page pool metadata like this patch.
> >>>>
> >>>>> If we manage the pages by self(no page pool), we do not care the metadata is for
> >>>>> page pool or not. We just use the space of pages like the "private".
> >>>>
> >>>> That's also fine.
> >>>>
>
> I'm not sure it is "fine" to, explicitly choosing not to use page pool,
> and then (ab)use `struct page` member (pp) that intended for page_pool
> for other stuff. (In this case create a linked list of pages).
>
>   +#define page_chain_next(p)	((struct page *)((p)->pp))
>   +#define page_chain_add(p, n)	((p)->pp = (void *)n)
>
> I'm not sure that I (as PP maintainer) can make this call actually, as I
> think this area belong with the MM "page" maintainers (Cc MM-list +
> people) to judge.
>
> Just invention new ways to use struct page fields without adding your
> use-case to struct page, will make it harder for MM people to maintain
> (e.g. make future change).
>
> --Jesper
>
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 4/6] virtio_net: big mode support premapped
  2024-04-19  7:03             ` Xuan Zhuo
@ 2024-04-19  7:24               ` Jason Wang
  2024-04-19  7:26                 ` Xuan Zhuo
  2024-04-19  7:52                 ` Xuan Zhuo
  0 siblings, 2 replies; 49+ messages in thread
From: Jason Wang @ 2024-04-19  7:24 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Fri, Apr 19, 2024 at 3:07 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Fri, 19 Apr 2024 13:46:25 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Fri, Apr 19, 2024 at 12:23 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Fri, 19 Apr 2024 08:43:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Thu, Apr 18, 2024 at 4:35 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Thu, 18 Apr 2024 14:25:06 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > > > > used, we can reuse them without needing to unmap and remap.
> > > > > > >
> > > > > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > > > > store the DMA address from the pp structure inside the page.
> > > > > > >
> > > > > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > > > > we remap it before returning it to the chain.
> > > > > > >
> > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > ---
> > > > > > >  drivers/net/virtio_net.c | 98 +++++++++++++++++++++++++++++++++-------
> > > > > > >  1 file changed, 81 insertions(+), 17 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > index 4446fb54de6d..7ea7e9bcd5d7 100644
> > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > @@ -50,6 +50,7 @@ module_param(napi_tx, bool, 0644);
> > > > > > >
> > > > > > >  #define page_chain_next(p)     ((struct page *)((p)->pp))
> > > > > > >  #define page_chain_add(p, n)   ((p)->pp = (void *)n)
> > > > > > > +#define page_dma_addr(p)       ((p)->dma_addr)
> > > > > > >
> > > > > > >  /* RX packet size EWMA. The average packet size is used to determine the packet
> > > > > > >   * buffer size when refilling RX rings. As the entire RX ring may be refilled
> > > > > > > @@ -434,6 +435,46 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > > > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > > > > >  }
> > > > > > >
> > > > > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > > > > +{
> > > > > > > +       sg->dma_address = addr;
> > > > > > > +       sg->length = len;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void page_chain_unmap(struct receive_queue *rq, struct page *p)
> > > > > > > +{
> > > > > > > +       virtqueue_dma_unmap_page_attrs(rq->vq, page_dma_addr(p), PAGE_SIZE,
> > > > > > > +                                      DMA_FROM_DEVICE, 0);
> > > > > > > +
> > > > > > > +       page_dma_addr(p) = DMA_MAPPING_ERROR;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static int page_chain_map(struct receive_queue *rq, struct page *p)
> > > > > > > +{
> > > > > > > +       dma_addr_t addr;
> > > > > > > +
> > > > > > > +       addr = virtqueue_dma_map_page_attrs(rq->vq, p, 0, PAGE_SIZE, DMA_FROM_DEVICE, 0);
> > > > > > > +       if (virtqueue_dma_mapping_error(rq->vq, addr))
> > > > > > > +               return -ENOMEM;
> > > > > > > +
> > > > > > > +       page_dma_addr(p) = addr;
> > > > > > > +       return 0;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void page_chain_release(struct receive_queue *rq)
> > > > > > > +{
> > > > > > > +       struct page *p, *n;
> > > > > > > +
> > > > > > > +       for (p = rq->pages; p; p = n) {
> > > > > > > +               n = page_chain_next(p);
> > > > > > > +
> > > > > > > +               page_chain_unmap(rq, p);
> > > > > > > +               __free_pages(p, 0);
> > > > > > > +       }
> > > > > > > +
> > > > > > > +       rq->pages = NULL;
> > > > > > > +}
> > > > > > > +
> > > > > > >  /*
> > > > > > >   * put the whole most recent used list in the beginning for reuse
> > > > > > >   */
> > > > > > > @@ -441,6 +482,13 @@ static void give_pages(struct receive_queue *rq, struct page *page)
> > > > > > >  {
> > > > > > >         struct page *end;
> > > > > > >
> > > > > > > +       if (page_dma_addr(page) == DMA_MAPPING_ERROR) {
> > > > > >
> > > > > > This looks strange, the map should be done during allocation. Under
> > > > > > which condition could we hit this?
> > > > >
> > > > > This first page is umapped before we call page_to_skb().
> > > > > The page can be put back to the link in case of failure.
> > > >
> > > > See below.
> > > >
> > > > >
> > > > >
> > > > > >
> > > > > > > +               if (page_chain_map(rq, page)) {
> > > > > > > +                       __free_pages(page, 0);
> > > > > > > +                       return;
> > > > > > > +               }
> > > > > > > +       }
> > > > > > > +
> > > > > > >         /* Find end of list, sew whole thing into vi->rq.pages. */
> > > > > > >         for (end = page; page_chain_next(end); end = page_chain_next(end));
> > > > > > >
> > > > > > > @@ -456,8 +504,15 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
> > > > > > >                 rq->pages = page_chain_next(p);
> > > > > > >                 /* clear chain here, it is used to chain pages */
> > > > > > >                 page_chain_add(p, NULL);
> > > > > > > -       } else
> > > > > > > +       } else {
> > > > > > >                 p = alloc_page(gfp_mask);
> > > > > > > +
> > > > > > > +               if (page_chain_map(rq, p)) {
> > > > > > > +                       __free_pages(p, 0);
> > > > > > > +                       return NULL;
> > > > > > > +               }
> > > > > > > +       }
> > > > > > > +
> > > > > > >         return p;
> > > > > > >  }
> > > > > > >
> > > > > > > @@ -613,8 +668,6 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > >                         return NULL;
> > > > > > >
> > > > > > >                 page = page_chain_next(page);
> > > > > > > -               if (page)
> > > > > > > -                       give_pages(rq, page);
> > > > > > >                 goto ok;
> > > > > > >         }
> > > > > > >
> > > > > > > @@ -640,6 +693,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > >                         skb_add_rx_frag(skb, 0, page, offset, len, truesize);
> > > > > > >                 else
> > > > > > >                         page_to_free = page;
> > > > > > > +               page = NULL;
> > > > > > >                 goto ok;
> > > > > > >         }
> > > > > > >
> > > > > > > @@ -657,6 +711,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > >         BUG_ON(offset >= PAGE_SIZE);
> > > > > > >         while (len) {
> > > > > > >                 unsigned int frag_size = min((unsigned)PAGE_SIZE - offset, len);
> > > > > > > +
> > > > > > > +               /* unmap the page before using it. */
> > > > > > > +               if (!offset)
> > > > > > > +                       page_chain_unmap(rq, page);
> > > > > > > +
> > > > > >
> > > > > > This sounds strange, do we need a virtqueue_sync_for_cpu() helper here?
> > > > >
> > > > > I think we do not need that. Because the umap api does it.
> > > > > We do not work with DMA_SKIP_SYNC;
> > > >
> > > > Well, the problem is unmap is too heavyweight and it reduces the
> > > > effort of trying to avoid map/umaps as much as possible.
> > > >
> > > > For example, for most of the case DMA sync is just a nop. And such
> > > > unmap() cause strange code in give_pages() as we discuss above?
> > >
> > > YES. You are right. For the first page, we just need to sync for cpu.
> > > And we do not need to check the dma status.
> > > But here (in page_to_skb), we need to call unmap, because this page is put
> > > to the skb.
> >
> > Right, but issue still,
> >
> > The only case that we may hit
> >
> >         if (page_dma_addr(page) == DMA_MAPPING_ERROR)
> >
> > is when the packet is smaller than GOOD_COPY_LEN.
> >
> > So if we sync_for_cpu for the head page, we don't do:
> >
> > 1) unmap in the receive_big()
> > 2) do snyc_for_cpu() just before skb_put_data(), so the page could be
> > recycled to the pool without unmapping?
>
>
> I do not get.

I meant something like e1000_copybreak().

>
> I think we can remove the code "if (page_dma_addr(page) == DMA_MAPPING_ERROR)"
> from give_pages(). We just do unmap when the page is leaving virtio-net.

That's the point.

>
> >
> > And I think we should do something similar for the mergeable case?
>
> Do what?
>
> We have used the sync api for mergeable case.

Where?

I see virtnet_rq_get_buf which did sync but it is done after the page_to_skb().

>
>
> >
> > Btw, I found one the misleading comment introduced by f80bd740cb7c9
> >
> >         /* copy small packet so we can reuse these pages */
> >         if (!NET_IP_ALIGN && len > GOOD_COPY_LEN && tailroom >= shinfo_size) {
> >
> > We're not copying but building skb around the head page.
>
> Will fix.
>
> Thanks.

Thanks

>
>
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > >
> >
>


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 4/6] virtio_net: big mode support premapped
  2024-04-19  7:24               ` Jason Wang
@ 2024-04-19  7:26                 ` Xuan Zhuo
  2024-04-19  8:12                   ` Jason Wang
  2024-04-19  7:52                 ` Xuan Zhuo
  1 sibling, 1 reply; 49+ messages in thread
From: Xuan Zhuo @ 2024-04-19  7:26 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Fri, 19 Apr 2024 15:24:25 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Fri, Apr 19, 2024 at 3:07 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Fri, 19 Apr 2024 13:46:25 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Fri, Apr 19, 2024 at 12:23 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Fri, 19 Apr 2024 08:43:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Thu, Apr 18, 2024 at 4:35 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Thu, 18 Apr 2024 14:25:06 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > > > > > used, we can reuse them without needing to unmap and remap.
> > > > > > > >
> > > > > > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > > > > > store the DMA address from the pp structure inside the page.
> > > > > > > >
> > > > > > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > > > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > > > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > > > > > we remap it before returning it to the chain.
> > > > > > > >
> > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > ---
> > > > > > > >  drivers/net/virtio_net.c | 98 +++++++++++++++++++++++++++++++++-------
> > > > > > > >  1 file changed, 81 insertions(+), 17 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > index 4446fb54de6d..7ea7e9bcd5d7 100644
> > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > @@ -50,6 +50,7 @@ module_param(napi_tx, bool, 0644);
> > > > > > > >
> > > > > > > >  #define page_chain_next(p)     ((struct page *)((p)->pp))
> > > > > > > >  #define page_chain_add(p, n)   ((p)->pp = (void *)n)
> > > > > > > > +#define page_dma_addr(p)       ((p)->dma_addr)
> > > > > > > >
> > > > > > > >  /* RX packet size EWMA. The average packet size is used to determine the packet
> > > > > > > >   * buffer size when refilling RX rings. As the entire RX ring may be refilled
> > > > > > > > @@ -434,6 +435,46 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > > > > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > > > > > +{
> > > > > > > > +       sg->dma_address = addr;
> > > > > > > > +       sg->length = len;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void page_chain_unmap(struct receive_queue *rq, struct page *p)
> > > > > > > > +{
> > > > > > > > +       virtqueue_dma_unmap_page_attrs(rq->vq, page_dma_addr(p), PAGE_SIZE,
> > > > > > > > +                                      DMA_FROM_DEVICE, 0);
> > > > > > > > +
> > > > > > > > +       page_dma_addr(p) = DMA_MAPPING_ERROR;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static int page_chain_map(struct receive_queue *rq, struct page *p)
> > > > > > > > +{
> > > > > > > > +       dma_addr_t addr;
> > > > > > > > +
> > > > > > > > +       addr = virtqueue_dma_map_page_attrs(rq->vq, p, 0, PAGE_SIZE, DMA_FROM_DEVICE, 0);
> > > > > > > > +       if (virtqueue_dma_mapping_error(rq->vq, addr))
> > > > > > > > +               return -ENOMEM;
> > > > > > > > +
> > > > > > > > +       page_dma_addr(p) = addr;
> > > > > > > > +       return 0;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void page_chain_release(struct receive_queue *rq)
> > > > > > > > +{
> > > > > > > > +       struct page *p, *n;
> > > > > > > > +
> > > > > > > > +       for (p = rq->pages; p; p = n) {
> > > > > > > > +               n = page_chain_next(p);
> > > > > > > > +
> > > > > > > > +               page_chain_unmap(rq, p);
> > > > > > > > +               __free_pages(p, 0);
> > > > > > > > +       }
> > > > > > > > +
> > > > > > > > +       rq->pages = NULL;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  /*
> > > > > > > >   * put the whole most recent used list in the beginning for reuse
> > > > > > > >   */
> > > > > > > > @@ -441,6 +482,13 @@ static void give_pages(struct receive_queue *rq, struct page *page)
> > > > > > > >  {
> > > > > > > >         struct page *end;
> > > > > > > >
> > > > > > > > +       if (page_dma_addr(page) == DMA_MAPPING_ERROR) {
> > > > > > >
> > > > > > > This looks strange, the map should be done during allocation. Under
> > > > > > > which condition could we hit this?
> > > > > >
> > > > > > This first page is umapped before we call page_to_skb().
> > > > > > The page can be put back to the link in case of failure.
> > > > >
> > > > > See below.
> > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > > +               if (page_chain_map(rq, page)) {
> > > > > > > > +                       __free_pages(page, 0);
> > > > > > > > +                       return;
> > > > > > > > +               }
> > > > > > > > +       }
> > > > > > > > +
> > > > > > > >         /* Find end of list, sew whole thing into vi->rq.pages. */
> > > > > > > >         for (end = page; page_chain_next(end); end = page_chain_next(end));
> > > > > > > >
> > > > > > > > @@ -456,8 +504,15 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
> > > > > > > >                 rq->pages = page_chain_next(p);
> > > > > > > >                 /* clear chain here, it is used to chain pages */
> > > > > > > >                 page_chain_add(p, NULL);
> > > > > > > > -       } else
> > > > > > > > +       } else {
> > > > > > > >                 p = alloc_page(gfp_mask);
> > > > > > > > +
> > > > > > > > +               if (page_chain_map(rq, p)) {
> > > > > > > > +                       __free_pages(p, 0);
> > > > > > > > +                       return NULL;
> > > > > > > > +               }
> > > > > > > > +       }
> > > > > > > > +
> > > > > > > >         return p;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > @@ -613,8 +668,6 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > >                         return NULL;
> > > > > > > >
> > > > > > > >                 page = page_chain_next(page);
> > > > > > > > -               if (page)
> > > > > > > > -                       give_pages(rq, page);
> > > > > > > >                 goto ok;
> > > > > > > >         }
> > > > > > > >
> > > > > > > > @@ -640,6 +693,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > >                         skb_add_rx_frag(skb, 0, page, offset, len, truesize);
> > > > > > > >                 else
> > > > > > > >                         page_to_free = page;
> > > > > > > > +               page = NULL;
> > > > > > > >                 goto ok;
> > > > > > > >         }
> > > > > > > >
> > > > > > > > @@ -657,6 +711,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > >         BUG_ON(offset >= PAGE_SIZE);
> > > > > > > >         while (len) {
> > > > > > > >                 unsigned int frag_size = min((unsigned)PAGE_SIZE - offset, len);
> > > > > > > > +
> > > > > > > > +               /* unmap the page before using it. */
> > > > > > > > +               if (!offset)
> > > > > > > > +                       page_chain_unmap(rq, page);
> > > > > > > > +
> > > > > > >
> > > > > > > This sounds strange, do we need a virtqueue_sync_for_cpu() helper here?
> > > > > >
> > > > > > I think we do not need that. Because the umap api does it.
> > > > > > We do not work with DMA_SKIP_SYNC;
> > > > >
> > > > > Well, the problem is unmap is too heavyweight and it reduces the
> > > > > effort of trying to avoid map/umaps as much as possible.
> > > > >
> > > > > For example, for most of the case DMA sync is just a nop. And such
> > > > > unmap() cause strange code in give_pages() as we discuss above?
> > > >
> > > > YES. You are right. For the first page, we just need to sync for cpu.
> > > > And we do not need to check the dma status.
> > > > But here (in page_to_skb), we need to call unmap, because this page is put
> > > > to the skb.
> > >
> > > Right, but issue still,
> > >
> > > The only case that we may hit
> > >
> > >         if (page_dma_addr(page) == DMA_MAPPING_ERROR)
> > >
> > > is when the packet is smaller than GOOD_COPY_LEN.
> > >
> > > So if we sync_for_cpu for the head page, we don't do:
> > >
> > > 1) unmap in the receive_big()
> > > 2) do snyc_for_cpu() just before skb_put_data(), so the page could be
> > > recycled to the pool without unmapping?
> >
> >
> > I do not get.
>
> I meant something like e1000_copybreak().
>
> >
> > I think we can remove the code "if (page_dma_addr(page) == DMA_MAPPING_ERROR)"
> > from give_pages(). We just do unmap when the page is leaving virtio-net.
>
> That's the point.
>
> >
> > >
> > > And I think we should do something similar for the mergeable case?
> >
> > Do what?
> >
> > We have used the sync api for mergeable case.
>
> Where?
>
> I see virtnet_rq_get_buf which did sync but it is done after the page_to_skb().

What means "done"?

Do you want to reuse the buffer?

Thanks.

>
> >
> >
> > >
> > > Btw, I found one the misleading comment introduced by f80bd740cb7c9
> > >
> > >         /* copy small packet so we can reuse these pages */
> > >         if (!NET_IP_ALIGN && len > GOOD_COPY_LEN && tailroom >= shinfo_size) {
> > >
> > > We're not copying but building skb around the head page.
> >
> > Will fix.
> >
> > Thanks.
>
> Thanks
>
> >
> >
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > >
> >
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 4/6] virtio_net: big mode support premapped
  2024-04-19  7:24               ` Jason Wang
  2024-04-19  7:26                 ` Xuan Zhuo
@ 2024-04-19  7:52                 ` Xuan Zhuo
  1 sibling, 0 replies; 49+ messages in thread
From: Xuan Zhuo @ 2024-04-19  7:52 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Fri, 19 Apr 2024 15:24:25 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Fri, Apr 19, 2024 at 3:07 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Fri, 19 Apr 2024 13:46:25 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Fri, Apr 19, 2024 at 12:23 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Fri, 19 Apr 2024 08:43:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Thu, Apr 18, 2024 at 4:35 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Thu, 18 Apr 2024 14:25:06 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > > > > > used, we can reuse them without needing to unmap and remap.
> > > > > > > >
> > > > > > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > > > > > store the DMA address from the pp structure inside the page.
> > > > > > > >
> > > > > > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > > > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > > > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > > > > > we remap it before returning it to the chain.
> > > > > > > >
> > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > ---
> > > > > > > >  drivers/net/virtio_net.c | 98 +++++++++++++++++++++++++++++++++-------
> > > > > > > >  1 file changed, 81 insertions(+), 17 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > index 4446fb54de6d..7ea7e9bcd5d7 100644
> > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > @@ -50,6 +50,7 @@ module_param(napi_tx, bool, 0644);
> > > > > > > >
> > > > > > > >  #define page_chain_next(p)     ((struct page *)((p)->pp))
> > > > > > > >  #define page_chain_add(p, n)   ((p)->pp = (void *)n)
> > > > > > > > +#define page_dma_addr(p)       ((p)->dma_addr)
> > > > > > > >
> > > > > > > >  /* RX packet size EWMA. The average packet size is used to determine the packet
> > > > > > > >   * buffer size when refilling RX rings. As the entire RX ring may be refilled
> > > > > > > > @@ -434,6 +435,46 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > > > > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > > > > > +{
> > > > > > > > +       sg->dma_address = addr;
> > > > > > > > +       sg->length = len;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void page_chain_unmap(struct receive_queue *rq, struct page *p)
> > > > > > > > +{
> > > > > > > > +       virtqueue_dma_unmap_page_attrs(rq->vq, page_dma_addr(p), PAGE_SIZE,
> > > > > > > > +                                      DMA_FROM_DEVICE, 0);
> > > > > > > > +
> > > > > > > > +       page_dma_addr(p) = DMA_MAPPING_ERROR;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static int page_chain_map(struct receive_queue *rq, struct page *p)
> > > > > > > > +{
> > > > > > > > +       dma_addr_t addr;
> > > > > > > > +
> > > > > > > > +       addr = virtqueue_dma_map_page_attrs(rq->vq, p, 0, PAGE_SIZE, DMA_FROM_DEVICE, 0);
> > > > > > > > +       if (virtqueue_dma_mapping_error(rq->vq, addr))
> > > > > > > > +               return -ENOMEM;
> > > > > > > > +
> > > > > > > > +       page_dma_addr(p) = addr;
> > > > > > > > +       return 0;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void page_chain_release(struct receive_queue *rq)
> > > > > > > > +{
> > > > > > > > +       struct page *p, *n;
> > > > > > > > +
> > > > > > > > +       for (p = rq->pages; p; p = n) {
> > > > > > > > +               n = page_chain_next(p);
> > > > > > > > +
> > > > > > > > +               page_chain_unmap(rq, p);
> > > > > > > > +               __free_pages(p, 0);
> > > > > > > > +       }
> > > > > > > > +
> > > > > > > > +       rq->pages = NULL;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  /*
> > > > > > > >   * put the whole most recent used list in the beginning for reuse
> > > > > > > >   */
> > > > > > > > @@ -441,6 +482,13 @@ static void give_pages(struct receive_queue *rq, struct page *page)
> > > > > > > >  {
> > > > > > > >         struct page *end;
> > > > > > > >
> > > > > > > > +       if (page_dma_addr(page) == DMA_MAPPING_ERROR) {
> > > > > > >
> > > > > > > This looks strange, the map should be done during allocation. Under
> > > > > > > which condition could we hit this?
> > > > > >
> > > > > > This first page is umapped before we call page_to_skb().
> > > > > > The page can be put back to the link in case of failure.
> > > > >
> > > > > See below.
> > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > > +               if (page_chain_map(rq, page)) {
> > > > > > > > +                       __free_pages(page, 0);
> > > > > > > > +                       return;
> > > > > > > > +               }
> > > > > > > > +       }
> > > > > > > > +
> > > > > > > >         /* Find end of list, sew whole thing into vi->rq.pages. */
> > > > > > > >         for (end = page; page_chain_next(end); end = page_chain_next(end));
> > > > > > > >
> > > > > > > > @@ -456,8 +504,15 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
> > > > > > > >                 rq->pages = page_chain_next(p);
> > > > > > > >                 /* clear chain here, it is used to chain pages */
> > > > > > > >                 page_chain_add(p, NULL);
> > > > > > > > -       } else
> > > > > > > > +       } else {
> > > > > > > >                 p = alloc_page(gfp_mask);
> > > > > > > > +
> > > > > > > > +               if (page_chain_map(rq, p)) {
> > > > > > > > +                       __free_pages(p, 0);
> > > > > > > > +                       return NULL;
> > > > > > > > +               }
> > > > > > > > +       }
> > > > > > > > +
> > > > > > > >         return p;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > @@ -613,8 +668,6 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > >                         return NULL;
> > > > > > > >
> > > > > > > >                 page = page_chain_next(page);
> > > > > > > > -               if (page)
> > > > > > > > -                       give_pages(rq, page);
> > > > > > > >                 goto ok;
> > > > > > > >         }
> > > > > > > >
> > > > > > > > @@ -640,6 +693,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > >                         skb_add_rx_frag(skb, 0, page, offset, len, truesize);
> > > > > > > >                 else
> > > > > > > >                         page_to_free = page;
> > > > > > > > +               page = NULL;
> > > > > > > >                 goto ok;
> > > > > > > >         }
> > > > > > > >
> > > > > > > > @@ -657,6 +711,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > >         BUG_ON(offset >= PAGE_SIZE);
> > > > > > > >         while (len) {
> > > > > > > >                 unsigned int frag_size = min((unsigned)PAGE_SIZE - offset, len);
> > > > > > > > +
> > > > > > > > +               /* unmap the page before using it. */
> > > > > > > > +               if (!offset)
> > > > > > > > +                       page_chain_unmap(rq, page);
> > > > > > > > +
> > > > > > >
> > > > > > > This sounds strange, do we need a virtqueue_sync_for_cpu() helper here?
> > > > > >
> > > > > > I think we do not need that. Because the umap api does it.
> > > > > > We do not work with DMA_SKIP_SYNC;
> > > > >
> > > > > Well, the problem is unmap is too heavyweight and it reduces the
> > > > > effort of trying to avoid map/umaps as much as possible.
> > > > >
> > > > > For example, for most of the case DMA sync is just a nop. And such
> > > > > unmap() cause strange code in give_pages() as we discuss above?
> > > >
> > > > YES. You are right. For the first page, we just need to sync for cpu.
> > > > And we do not need to check the dma status.
> > > > But here (in page_to_skb), we need to call unmap, because this page is put
> > > > to the skb.
> > >
> > > Right, but issue still,
> > >
> > > The only case that we may hit
> > >
> > >         if (page_dma_addr(page) == DMA_MAPPING_ERROR)
> > >
> > > is when the packet is smaller than GOOD_COPY_LEN.
> > >
> > > So if we sync_for_cpu for the head page, we don't do:
> > >
> > > 1) unmap in the receive_big()
> > > 2) do snyc_for_cpu() just before skb_put_data(), so the page could be
> > > recycled to the pool without unmapping?
> >
> >
> > I do not get.
>
> I meant something like e1000_copybreak().
>
> >
> > I think we can remove the code "if (page_dma_addr(page) == DMA_MAPPING_ERROR)"
> > from give_pages(). We just do unmap when the page is leaving virtio-net.
>
> That's the point.
>
> >
> > >
> > > And I think we should do something similar for the mergeable case?
> >
> > Do what?
> >
> > We have used the sync api for mergeable case.
>
> Where?
>
> I see virtnet_rq_get_buf which did sync but it is done after the page_to_skb().

Do you want to refill the buffer to vq?

But the page_frag doest not support to recycle buffer.

Thanks.


>
> >
> >
> > >
> > > Btw, I found one the misleading comment introduced by f80bd740cb7c9
> > >
> > >         /* copy small packet so we can reuse these pages */
> > >         if (!NET_IP_ALIGN && len > GOOD_COPY_LEN && tailroom >= shinfo_size) {
> > >
> > > We're not copying but building skb around the head page.
> >
> > Will fix.
> >
> > Thanks.
>
> Thanks
>
> >
> >
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > >
> >
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 4/6] virtio_net: big mode support premapped
  2024-04-19  7:26                 ` Xuan Zhuo
@ 2024-04-19  8:12                   ` Jason Wang
  2024-04-19  8:14                     ` Xuan Zhuo
  0 siblings, 1 reply; 49+ messages in thread
From: Jason Wang @ 2024-04-19  8:12 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Fri, Apr 19, 2024 at 3:28 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Fri, 19 Apr 2024 15:24:25 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Fri, Apr 19, 2024 at 3:07 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Fri, 19 Apr 2024 13:46:25 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Fri, Apr 19, 2024 at 12:23 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Fri, 19 Apr 2024 08:43:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Thu, Apr 18, 2024 at 4:35 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > On Thu, 18 Apr 2024 14:25:06 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > >
> > > > > > > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > > > > > > used, we can reuse them without needing to unmap and remap.
> > > > > > > > >
> > > > > > > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > > > > > > store the DMA address from the pp structure inside the page.
> > > > > > > > >
> > > > > > > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > > > > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > > > > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > > > > > > we remap it before returning it to the chain.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > ---
> > > > > > > > >  drivers/net/virtio_net.c | 98 +++++++++++++++++++++++++++++++++-------
> > > > > > > > >  1 file changed, 81 insertions(+), 17 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > > index 4446fb54de6d..7ea7e9bcd5d7 100644
> > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > @@ -50,6 +50,7 @@ module_param(napi_tx, bool, 0644);
> > > > > > > > >
> > > > > > > > >  #define page_chain_next(p)     ((struct page *)((p)->pp))
> > > > > > > > >  #define page_chain_add(p, n)   ((p)->pp = (void *)n)
> > > > > > > > > +#define page_dma_addr(p)       ((p)->dma_addr)
> > > > > > > > >
> > > > > > > > >  /* RX packet size EWMA. The average packet size is used to determine the packet
> > > > > > > > >   * buffer size when refilling RX rings. As the entire RX ring may be refilled
> > > > > > > > > @@ -434,6 +435,46 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > > > > > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > > > > > > +{
> > > > > > > > > +       sg->dma_address = addr;
> > > > > > > > > +       sg->length = len;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static void page_chain_unmap(struct receive_queue *rq, struct page *p)
> > > > > > > > > +{
> > > > > > > > > +       virtqueue_dma_unmap_page_attrs(rq->vq, page_dma_addr(p), PAGE_SIZE,
> > > > > > > > > +                                      DMA_FROM_DEVICE, 0);
> > > > > > > > > +
> > > > > > > > > +       page_dma_addr(p) = DMA_MAPPING_ERROR;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static int page_chain_map(struct receive_queue *rq, struct page *p)
> > > > > > > > > +{
> > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > +
> > > > > > > > > +       addr = virtqueue_dma_map_page_attrs(rq->vq, p, 0, PAGE_SIZE, DMA_FROM_DEVICE, 0);
> > > > > > > > > +       if (virtqueue_dma_mapping_error(rq->vq, addr))
> > > > > > > > > +               return -ENOMEM;
> > > > > > > > > +
> > > > > > > > > +       page_dma_addr(p) = addr;
> > > > > > > > > +       return 0;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static void page_chain_release(struct receive_queue *rq)
> > > > > > > > > +{
> > > > > > > > > +       struct page *p, *n;
> > > > > > > > > +
> > > > > > > > > +       for (p = rq->pages; p; p = n) {
> > > > > > > > > +               n = page_chain_next(p);
> > > > > > > > > +
> > > > > > > > > +               page_chain_unmap(rq, p);
> > > > > > > > > +               __free_pages(p, 0);
> > > > > > > > > +       }
> > > > > > > > > +
> > > > > > > > > +       rq->pages = NULL;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > >  /*
> > > > > > > > >   * put the whole most recent used list in the beginning for reuse
> > > > > > > > >   */
> > > > > > > > > @@ -441,6 +482,13 @@ static void give_pages(struct receive_queue *rq, struct page *page)
> > > > > > > > >  {
> > > > > > > > >         struct page *end;
> > > > > > > > >
> > > > > > > > > +       if (page_dma_addr(page) == DMA_MAPPING_ERROR) {
> > > > > > > >
> > > > > > > > This looks strange, the map should be done during allocation. Under
> > > > > > > > which condition could we hit this?
> > > > > > >
> > > > > > > This first page is umapped before we call page_to_skb().
> > > > > > > The page can be put back to the link in case of failure.
> > > > > >
> > > > > > See below.
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > +               if (page_chain_map(rq, page)) {
> > > > > > > > > +                       __free_pages(page, 0);
> > > > > > > > > +                       return;
> > > > > > > > > +               }
> > > > > > > > > +       }
> > > > > > > > > +
> > > > > > > > >         /* Find end of list, sew whole thing into vi->rq.pages. */
> > > > > > > > >         for (end = page; page_chain_next(end); end = page_chain_next(end));
> > > > > > > > >
> > > > > > > > > @@ -456,8 +504,15 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
> > > > > > > > >                 rq->pages = page_chain_next(p);
> > > > > > > > >                 /* clear chain here, it is used to chain pages */
> > > > > > > > >                 page_chain_add(p, NULL);
> > > > > > > > > -       } else
> > > > > > > > > +       } else {
> > > > > > > > >                 p = alloc_page(gfp_mask);
> > > > > > > > > +
> > > > > > > > > +               if (page_chain_map(rq, p)) {
> > > > > > > > > +                       __free_pages(p, 0);
> > > > > > > > > +                       return NULL;
> > > > > > > > > +               }
> > > > > > > > > +       }
> > > > > > > > > +
> > > > > > > > >         return p;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > @@ -613,8 +668,6 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > > >                         return NULL;
> > > > > > > > >
> > > > > > > > >                 page = page_chain_next(page);
> > > > > > > > > -               if (page)
> > > > > > > > > -                       give_pages(rq, page);
> > > > > > > > >                 goto ok;
> > > > > > > > >         }
> > > > > > > > >
> > > > > > > > > @@ -640,6 +693,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > > >                         skb_add_rx_frag(skb, 0, page, offset, len, truesize);
> > > > > > > > >                 else
> > > > > > > > >                         page_to_free = page;
> > > > > > > > > +               page = NULL;
> > > > > > > > >                 goto ok;
> > > > > > > > >         }
> > > > > > > > >
> > > > > > > > > @@ -657,6 +711,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > > >         BUG_ON(offset >= PAGE_SIZE);
> > > > > > > > >         while (len) {
> > > > > > > > >                 unsigned int frag_size = min((unsigned)PAGE_SIZE - offset, len);
> > > > > > > > > +
> > > > > > > > > +               /* unmap the page before using it. */
> > > > > > > > > +               if (!offset)
> > > > > > > > > +                       page_chain_unmap(rq, page);
> > > > > > > > > +
> > > > > > > >
> > > > > > > > This sounds strange, do we need a virtqueue_sync_for_cpu() helper here?
> > > > > > >
> > > > > > > I think we do not need that. Because the umap api does it.
> > > > > > > We do not work with DMA_SKIP_SYNC;
> > > > > >
> > > > > > Well, the problem is unmap is too heavyweight and it reduces the
> > > > > > effort of trying to avoid map/umaps as much as possible.
> > > > > >
> > > > > > For example, for most of the case DMA sync is just a nop. And such
> > > > > > unmap() cause strange code in give_pages() as we discuss above?
> > > > >
> > > > > YES. You are right. For the first page, we just need to sync for cpu.
> > > > > And we do not need to check the dma status.
> > > > > But here (in page_to_skb), we need to call unmap, because this page is put
> > > > > to the skb.
> > > >
> > > > Right, but issue still,
> > > >
> > > > The only case that we may hit
> > > >
> > > >         if (page_dma_addr(page) == DMA_MAPPING_ERROR)
> > > >
> > > > is when the packet is smaller than GOOD_COPY_LEN.
> > > >
> > > > So if we sync_for_cpu for the head page, we don't do:
> > > >
> > > > 1) unmap in the receive_big()
> > > > 2) do snyc_for_cpu() just before skb_put_data(), so the page could be
> > > > recycled to the pool without unmapping?
> > >
> > >
> > > I do not get.
> >
> > I meant something like e1000_copybreak().
> >
> > >
> > > I think we can remove the code "if (page_dma_addr(page) == DMA_MAPPING_ERROR)"
> > > from give_pages(). We just do unmap when the page is leaving virtio-net.
> >
> > That's the point.
> >
> > >
> > > >
> > > > And I think we should do something similar for the mergeable case?
> > >
> > > Do what?
> > >
> > > We have used the sync api for mergeable case.
> >
> > Where?
> >
> > I see virtnet_rq_get_buf which did sync but it is done after the page_to_skb().
>
> What means "done"?
>
> Do you want to reuse the buffer?

Nope, I think I misread the code. Mergeable buffers should be fine as
the unmap were during virtnet_receive().

But the code might needs some tweak in the future

in virtnet_receive() we had:

if (!vi->big_packets || vi->mergeable_rx_bufs) {
        virtnet_rq_get_buf
        receive_buf()
} else {
        virtqueue_get_buf()
}

but there's another switch in receive_buf():

if (vi->mergeable_rx_bufs)
else if (vi->big_packets)
else
...

Which is kind of a mess somehow.

Thanks
>
> Thanks.
>
> >
> > >
> > >
> > > >
> > > > Btw, I found one the misleading comment introduced by f80bd740cb7c9
> > > >
> > > >         /* copy small packet so we can reuse these pages */
> > > >         if (!NET_IP_ALIGN && len > GOOD_COPY_LEN && tailroom >= shinfo_size) {
> > > >
> > > > We're not copying but building skb around the head page.
> > >
> > > Will fix.
> > >
> > > Thanks.
> >
> > Thanks
> >
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > >
> > > >
> > >
> >
>


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH vhost 4/6] virtio_net: big mode support premapped
  2024-04-19  8:12                   ` Jason Wang
@ 2024-04-19  8:14                     ` Xuan Zhuo
  0 siblings, 0 replies; 49+ messages in thread
From: Xuan Zhuo @ 2024-04-19  8:14 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Fri, 19 Apr 2024 16:12:15 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Fri, Apr 19, 2024 at 3:28 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Fri, 19 Apr 2024 15:24:25 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Fri, Apr 19, 2024 at 3:07 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Fri, 19 Apr 2024 13:46:25 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Fri, Apr 19, 2024 at 12:23 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Fri, 19 Apr 2024 08:43:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Thu, Apr 18, 2024 at 4:35 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > On Thu, 18 Apr 2024 14:25:06 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > >
> > > > > > > > > > In big mode, pre-mapping DMA is beneficial because if the pages are not
> > > > > > > > > > used, we can reuse them without needing to unmap and remap.
> > > > > > > > > >
> > > > > > > > > > We require space to store the DMA address. I use the page.dma_addr to
> > > > > > > > > > store the DMA address from the pp structure inside the page.
> > > > > > > > > >
> > > > > > > > > > Every page retrieved from get_a_page() is mapped, and its DMA address is
> > > > > > > > > > stored in page.dma_addr. When a page is returned to the chain, we check
> > > > > > > > > > the DMA status; if it is not mapped (potentially having been unmapped),
> > > > > > > > > > we remap it before returning it to the chain.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > > ---
> > > > > > > > > >  drivers/net/virtio_net.c | 98 +++++++++++++++++++++++++++++++++-------
> > > > > > > > > >  1 file changed, 81 insertions(+), 17 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > > > index 4446fb54de6d..7ea7e9bcd5d7 100644
> > > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > > @@ -50,6 +50,7 @@ module_param(napi_tx, bool, 0644);
> > > > > > > > > >
> > > > > > > > > >  #define page_chain_next(p)     ((struct page *)((p)->pp))
> > > > > > > > > >  #define page_chain_add(p, n)   ((p)->pp = (void *)n)
> > > > > > > > > > +#define page_dma_addr(p)       ((p)->dma_addr)
> > > > > > > > > >
> > > > > > > > > >  /* RX packet size EWMA. The average packet size is used to determine the packet
> > > > > > > > > >   * buffer size when refilling RX rings. As the entire RX ring may be refilled
> > > > > > > > > > @@ -434,6 +435,46 @@ skb_vnet_common_hdr(struct sk_buff *skb)
> > > > > > > > > >         return (struct virtio_net_common_hdr *)skb->cb;
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > > > > > > > +{
> > > > > > > > > > +       sg->dma_address = addr;
> > > > > > > > > > +       sg->length = len;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +static void page_chain_unmap(struct receive_queue *rq, struct page *p)
> > > > > > > > > > +{
> > > > > > > > > > +       virtqueue_dma_unmap_page_attrs(rq->vq, page_dma_addr(p), PAGE_SIZE,
> > > > > > > > > > +                                      DMA_FROM_DEVICE, 0);
> > > > > > > > > > +
> > > > > > > > > > +       page_dma_addr(p) = DMA_MAPPING_ERROR;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +static int page_chain_map(struct receive_queue *rq, struct page *p)
> > > > > > > > > > +{
> > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > +
> > > > > > > > > > +       addr = virtqueue_dma_map_page_attrs(rq->vq, p, 0, PAGE_SIZE, DMA_FROM_DEVICE, 0);
> > > > > > > > > > +       if (virtqueue_dma_mapping_error(rq->vq, addr))
> > > > > > > > > > +               return -ENOMEM;
> > > > > > > > > > +
> > > > > > > > > > +       page_dma_addr(p) = addr;
> > > > > > > > > > +       return 0;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +static void page_chain_release(struct receive_queue *rq)
> > > > > > > > > > +{
> > > > > > > > > > +       struct page *p, *n;
> > > > > > > > > > +
> > > > > > > > > > +       for (p = rq->pages; p; p = n) {
> > > > > > > > > > +               n = page_chain_next(p);
> > > > > > > > > > +
> > > > > > > > > > +               page_chain_unmap(rq, p);
> > > > > > > > > > +               __free_pages(p, 0);
> > > > > > > > > > +       }
> > > > > > > > > > +
> > > > > > > > > > +       rq->pages = NULL;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > >  /*
> > > > > > > > > >   * put the whole most recent used list in the beginning for reuse
> > > > > > > > > >   */
> > > > > > > > > > @@ -441,6 +482,13 @@ static void give_pages(struct receive_queue *rq, struct page *page)
> > > > > > > > > >  {
> > > > > > > > > >         struct page *end;
> > > > > > > > > >
> > > > > > > > > > +       if (page_dma_addr(page) == DMA_MAPPING_ERROR) {
> > > > > > > > >
> > > > > > > > > This looks strange, the map should be done during allocation. Under
> > > > > > > > > which condition could we hit this?
> > > > > > > >
> > > > > > > > This first page is umapped before we call page_to_skb().
> > > > > > > > The page can be put back to the link in case of failure.
> > > > > > >
> > > > > > > See below.
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > +               if (page_chain_map(rq, page)) {
> > > > > > > > > > +                       __free_pages(page, 0);
> > > > > > > > > > +                       return;
> > > > > > > > > > +               }
> > > > > > > > > > +       }
> > > > > > > > > > +
> > > > > > > > > >         /* Find end of list, sew whole thing into vi->rq.pages. */
> > > > > > > > > >         for (end = page; page_chain_next(end); end = page_chain_next(end));
> > > > > > > > > >
> > > > > > > > > > @@ -456,8 +504,15 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
> > > > > > > > > >                 rq->pages = page_chain_next(p);
> > > > > > > > > >                 /* clear chain here, it is used to chain pages */
> > > > > > > > > >                 page_chain_add(p, NULL);
> > > > > > > > > > -       } else
> > > > > > > > > > +       } else {
> > > > > > > > > >                 p = alloc_page(gfp_mask);
> > > > > > > > > > +
> > > > > > > > > > +               if (page_chain_map(rq, p)) {
> > > > > > > > > > +                       __free_pages(p, 0);
> > > > > > > > > > +                       return NULL;
> > > > > > > > > > +               }
> > > > > > > > > > +       }
> > > > > > > > > > +
> > > > > > > > > >         return p;
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > @@ -613,8 +668,6 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > > > >                         return NULL;
> > > > > > > > > >
> > > > > > > > > >                 page = page_chain_next(page);
> > > > > > > > > > -               if (page)
> > > > > > > > > > -                       give_pages(rq, page);
> > > > > > > > > >                 goto ok;
> > > > > > > > > >         }
> > > > > > > > > >
> > > > > > > > > > @@ -640,6 +693,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > > > >                         skb_add_rx_frag(skb, 0, page, offset, len, truesize);
> > > > > > > > > >                 else
> > > > > > > > > >                         page_to_free = page;
> > > > > > > > > > +               page = NULL;
> > > > > > > > > >                 goto ok;
> > > > > > > > > >         }
> > > > > > > > > >
> > > > > > > > > > @@ -657,6 +711,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > > > >         BUG_ON(offset >= PAGE_SIZE);
> > > > > > > > > >         while (len) {
> > > > > > > > > >                 unsigned int frag_size = min((unsigned)PAGE_SIZE - offset, len);
> > > > > > > > > > +
> > > > > > > > > > +               /* unmap the page before using it. */
> > > > > > > > > > +               if (!offset)
> > > > > > > > > > +                       page_chain_unmap(rq, page);
> > > > > > > > > > +
> > > > > > > > >
> > > > > > > > > This sounds strange, do we need a virtqueue_sync_for_cpu() helper here?
> > > > > > > >
> > > > > > > > I think we do not need that. Because the umap api does it.
> > > > > > > > We do not work with DMA_SKIP_SYNC;
> > > > > > >
> > > > > > > Well, the problem is unmap is too heavyweight and it reduces the
> > > > > > > effort of trying to avoid map/umaps as much as possible.
> > > > > > >
> > > > > > > For example, for most of the case DMA sync is just a nop. And such
> > > > > > > unmap() cause strange code in give_pages() as we discuss above?
> > > > > >
> > > > > > YES. You are right. For the first page, we just need to sync for cpu.
> > > > > > And we do not need to check the dma status.
> > > > > > But here (in page_to_skb), we need to call unmap, because this page is put
> > > > > > to the skb.
> > > > >
> > > > > Right, but issue still,
> > > > >
> > > > > The only case that we may hit
> > > > >
> > > > >         if (page_dma_addr(page) == DMA_MAPPING_ERROR)
> > > > >
> > > > > is when the packet is smaller than GOOD_COPY_LEN.
> > > > >
> > > > > So if we sync_for_cpu for the head page, we don't do:
> > > > >
> > > > > 1) unmap in the receive_big()
> > > > > 2) do snyc_for_cpu() just before skb_put_data(), so the page could be
> > > > > recycled to the pool without unmapping?
> > > >
> > > >
> > > > I do not get.
> > >
> > > I meant something like e1000_copybreak().
> > >
> > > >
> > > > I think we can remove the code "if (page_dma_addr(page) == DMA_MAPPING_ERROR)"
> > > > from give_pages(). We just do unmap when the page is leaving virtio-net.
> > >
> > > That's the point.
> > >
> > > >
> > > > >
> > > > > And I think we should do something similar for the mergeable case?
> > > >
> > > > Do what?
> > > >
> > > > We have used the sync api for mergeable case.
> > >
> > > Where?
> > >
> > > I see virtnet_rq_get_buf which did sync but it is done after the page_to_skb().
> >
> > What means "done"?
> >
> > Do you want to reuse the buffer?
>
> Nope, I think I misread the code. Mergeable buffers should be fine as
> the unmap were during virtnet_receive().
>
> But the code might needs some tweak in the future
>
> in virtnet_receive() we had:
>
> if (!vi->big_packets || vi->mergeable_rx_bufs) {
>         virtnet_rq_get_buf
>         receive_buf()
> } else {
>         virtqueue_get_buf()
> }
>
> but there's another switch in receive_buf():
>
> if (vi->mergeable_rx_bufs)
> else if (vi->big_packets)
> else
> ...
>
> Which is kind of a mess somehow.

YES. I will change this in the AF_XDP patch set.

Thanks.


>
> Thanks
> >
> > Thanks.
> >
> > >
> > > >
> > > >
> > > > >
> > > > > Btw, I found one the misleading comment introduced by f80bd740cb7c9
> > > > >
> > > > >         /* copy small packet so we can reuse these pages */
> > > > >         if (!NET_IP_ALIGN && len > GOOD_COPY_LEN && tailroom >= shinfo_size) {
> > > > >
> > > > > We're not copying but building skb around the head page.
> > > >
> > > > Will fix.
> > > >
> > > > Thanks.
> > >
> > > Thanks
> > >
> > > >
> > > >
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2024-04-19  8:15 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-11  2:51 [PATCH vhost 0/6] virtio_net: rx enable premapped mode by default Xuan Zhuo
2024-04-11  2:51 ` [PATCH vhost 1/6] virtio_ring: introduce dma map api for page Xuan Zhuo
2024-04-11 11:45   ` Alexander Lobakin
2024-04-12  3:48     ` Xuan Zhuo
2024-04-18  6:08   ` Jason Wang
2024-04-11  2:51 ` [PATCH vhost 2/6] virtio_ring: enable premapped mode whatever use_dma_api Xuan Zhuo
2024-04-18  6:09   ` Jason Wang
2024-04-18  6:13   ` Jason Wang
2024-04-11  2:51 ` [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page Xuan Zhuo
2024-04-12  4:47   ` Jason Wang
2024-04-12  5:35     ` Xuan Zhuo
2024-04-12  5:49       ` Jason Wang
2024-04-12  6:02         ` Xuan Zhuo
2024-04-15  2:08         ` Xuan Zhuo
2024-04-15  6:43           ` Jason Wang
2024-04-15  8:36             ` Xuan Zhuo
2024-04-15  8:56               ` Jason Wang
2024-04-15  8:59                 ` Xuan Zhuo
2024-04-16  3:24                   ` Jason Wang
2024-04-17  1:30                     ` Xuan Zhuo
2024-04-17  4:08                       ` Jason Wang
2024-04-17  8:20                         ` Xuan Zhuo
2024-04-18  4:15                           ` Jason Wang
2024-04-18  4:16                             ` Jason Wang
2024-04-18 20:19                           ` Jesper Dangaard Brouer
2024-04-18 21:56                             ` Matthew Wilcox
2024-04-19  7:11                             ` Xuan Zhuo
2024-04-18  6:11   ` Jason Wang
2024-04-11  2:51 ` [PATCH vhost 4/6] virtio_net: big mode support premapped Xuan Zhuo
2024-04-11 16:34   ` kernel test robot
2024-04-11 20:11   ` kernel test robot
2024-04-14  9:48   ` Dan Carpenter
2024-04-18  6:25   ` Jason Wang
2024-04-18  8:29     ` Xuan Zhuo
2024-04-19  0:43       ` Jason Wang
2024-04-19  4:21         ` Xuan Zhuo
2024-04-19  5:46           ` Jason Wang
2024-04-19  7:03             ` Xuan Zhuo
2024-04-19  7:24               ` Jason Wang
2024-04-19  7:26                 ` Xuan Zhuo
2024-04-19  8:12                   ` Jason Wang
2024-04-19  8:14                     ` Xuan Zhuo
2024-04-19  7:52                 ` Xuan Zhuo
2024-04-11  2:51 ` [PATCH vhost 5/6] virtio_net: enable premapped by default Xuan Zhuo
2024-04-18  6:26   ` Jason Wang
2024-04-18  8:35     ` Xuan Zhuo
2024-04-19  0:44       ` Jason Wang
2024-04-11  2:51 ` [PATCH vhost 6/6] virtio_net: rx remove premapped failover code Xuan Zhuo
2024-04-18  6:31   ` Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.