From mboxrd@z Thu Jan 1 00:00:00 1970 From: Maxime Coquelin Subject: Re: [PATCH v6 6/6] vhost: optimize cache access Date: Wed, 21 Sep 2016 06:32:54 +0200 Message-ID: <2482b769-c5db-95a4-be52-18b444f75dfb@redhat.com> References: <1471319402-112998-1-git-send-email-zhihong.wang@intel.com> <1474336817-22683-1-git-send-email-zhihong.wang@intel.com> <1474336817-22683-7-git-send-email-zhihong.wang@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: yuanhan.liu@linux.intel.com, thomas.monjalon@6wind.com To: Zhihong Wang , dev@dpdk.org Return-path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id D9F062B84 for ; Wed, 21 Sep 2016 06:32:59 +0200 (CEST) In-Reply-To: <1474336817-22683-7-git-send-email-zhihong.wang@intel.com> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 09/20/2016 04:00 AM, Zhihong Wang wrote: > This patch reorders the code to delay virtio header write to improve > cache access efficiency for cases where the mrg_rxbuf feature is turned > on. CPU pipeline stall cycles can be significantly reduced. > > Virtio header write and mbuf data copy are all remote store operations > which takes a long time to finish. It's a good idea to put them together > to remove bubbles in between, to let as many remote store instructions > as possible go into store buffer at the same time to hide latency, and > to let the H/W prefetcher goes to work as early as possible. > > On a Haswell machine, about 100 cycles can be saved per packet by this > patch alone. Taking 64B packets traffic for example, this means about 60% > efficiency improvement for the enqueue operation. Thanks for the detailed information, I appreciate it. Maxime