From mboxrd@z Thu Jan  1 00:00:00 1970
From: Maxime Coquelin <maxime.coquelin@redhat.com>
Subject: Re: [PATCH v6 6/6] vhost: optimize cache access
Date: Wed, 21 Sep 2016 06:32:54 +0200
Message-ID: <2482b769-c5db-95a4-be52-18b444f75dfb@redhat.com>
References: <1471319402-112998-1-git-send-email-zhihong.wang@intel.com>
 <1474336817-22683-1-git-send-email-zhihong.wang@intel.com>
 <1474336817-22683-7-git-send-email-zhihong.wang@intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Cc: yuanhan.liu@linux.intel.com, thomas.monjalon@6wind.com
To: Zhihong Wang <zhihong.wang@intel.com>, dev@dpdk.org
Return-path: <dev-bounces@dpdk.org>
Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28])
 by dpdk.org (Postfix) with ESMTP id D9F062B84
 for <dev@dpdk.org>; Wed, 21 Sep 2016 06:32:59 +0200 (CEST)
In-Reply-To: <1474336817-22683-7-git-send-email-zhihong.wang@intel.com>
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>


On 09/20/2016 04:00 AM, Zhihong Wang wrote:
> This patch reorders the code to delay virtio header write to improve
> cache access efficiency for cases where the mrg_rxbuf feature is turned
> on. CPU pipeline stall cycles can be significantly reduced.
>
> Virtio header write and mbuf data copy are all remote store operations
> which takes a long time to finish. It's a good idea to put them together
> to remove bubbles in between, to let as many remote store instructions
> as possible go into store buffer at the same time to hide latency, and
> to let the H/W prefetcher goes to work as early as possible.
>
> On a Haswell machine, about 100 cycles can be saved per packet by this
> patch alone. Taking 64B packets traffic for example, this means about 60%
> efficiency improvement for the enqueue operation.

Thanks for the detailed information, I appreciate it.

Maxime