From mboxrd@z Thu Jan  1 00:00:00 1970
From: Zhihong Wang <zhihong.wang@intel.com>
Subject: [PATCH v6 6/6] vhost: optimize cache access
Date: Mon, 19 Sep 2016 22:00:17 -0400
Message-ID: <1474336817-22683-7-git-send-email-zhihong.wang@intel.com>
References: <1471319402-112998-1-git-send-email-zhihong.wang@intel.com>
 <1474336817-22683-1-git-send-email-zhihong.wang@intel.com>
Cc: maxime.coquelin@redhat.com, yuanhan.liu@linux.intel.com,
 thomas.monjalon@6wind.com, Zhihong Wang <zhihong.wang@intel.com>
To: dev@dpdk.org
Return-path: <dev-bounces@dpdk.org>
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
 by dpdk.org (Postfix) with ESMTP id 696845A72
 for <dev@dpdk.org>; Tue, 20 Sep 2016 11:08:12 +0200 (CEST)
In-Reply-To: <1474336817-22683-1-git-send-email-zhihong.wang@intel.com>
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

This patch reorders the code to delay virtio header write to improve
cache access efficiency for cases where the mrg_rxbuf feature is turned
on. CPU pipeline stall cycles can be significantly reduced.

Virtio header write and mbuf data copy are all remote store operations
which takes a long time to finish. It's a good idea to put them together
to remove bubbles in between, to let as many remote store instructions
as possible go into store buffer at the same time to hide latency, and
to let the H/W prefetcher goes to work as early as possible.

On a Haswell machine, about 100 cycles can be saved per packet by this
patch alone. Taking 64B packets traffic for example, this means about 60%
efficiency improvement for the enqueue operation.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
---
Changes in v3:

 1. Remove unnecessary memset which causes frontend stall on SNB & IVB.

 2. Rename variables to follow naming convention.

 lib/librte_vhost/virtio_net.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 8f2882b..11a2c1a 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -185,6 +185,7 @@ enqueue_packet(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	uint32_t mbuf_len;
 	uint32_t mbuf_avail;
 	uint32_t cpy_len;
+	uint32_t copy_virtio_hdr;
 	uint32_t num_buffers = 0;
 
 	/* start with the first mbuf of the packet */
@@ -199,12 +200,12 @@ enqueue_packet(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	if (unlikely(!desc_addr))
 		goto error;
 
-	/* handle virtio header */
+	/*
+	 * handle virtio header, the actual write operation is delayed
+	 * for cache optimization, to reduce CPU pipeline stall cycles.
+	 */
 	virtio_hdr = (struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr;
-	virtio_enqueue_offload(mbuf, &(virtio_hdr->hdr));
-	if (is_mrg_rxbuf)
-		virtio_hdr->num_buffers = 1;
-
+	copy_virtio_hdr = 1;
 	vhost_log_write(dev, desc->addr, dev->vhost_hlen);
 	PRINT_PACKET(dev, (uintptr_t)desc_addr, dev->vhost_hlen, 0);
 	desc_offset = dev->vhost_hlen;
@@ -249,8 +250,15 @@ enqueue_packet(struct virtio_net *dev, struct vhost_virtqueue *vq,
 				goto error;
 		}
 
-		/* copy mbuf data */
+		/* copy virtio header and mbuf data */
 		cpy_len = RTE_MIN(desc->len - desc_offset, mbuf_avail);
+		if (copy_virtio_hdr) {
+			copy_virtio_hdr = 0;
+			virtio_enqueue_offload(mbuf, &(virtio_hdr->hdr));
+			if (is_mrg_rxbuf)
+				virtio_hdr->num_buffers = num_buffers + 1;
+		}
+
 		rte_memcpy((void *)(uintptr_t)desc_addr,
 				rte_pktmbuf_mtod_offset(mbuf, void *,
 					mbuf_len - mbuf_avail),
-- 
2.7.4