From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yuanhan Liu Subject: Re: [PATCH v6 0/6] vhost: optimize enqueue Date: Wed, 21 Sep 2016 10:26:56 +0800 Message-ID: <20160921022656.GA23158@yliu-dev.sh.intel.com> References: <1474336817-22683-1-git-send-email-zhihong.wang@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Zhihong Wang , dev@dpdk.org, thomas.monjalon@6wind.com To: maxime.coquelin@redhat.com Return-path: Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id F2DBA37B8 for ; Wed, 21 Sep 2016 04:26:20 +0200 (CEST) Content-Disposition: inline In-Reply-To: <1474336817-22683-1-git-send-email-zhihong.wang@intel.com> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Maxime, Do you have more comments about this set? If no, I think I could merge it shortly. Thanks. --yliu On Mon, Sep 19, 2016 at 10:00:11PM -0400, Zhihong Wang wrote: > This patch set optimizes the vhost enqueue function. > > It implements the vhost logic from scratch into a single function designed > for high performance and good maintainability, and improves CPU efficiency > significantly by optimizing cache access, which means: > > * Higher maximum throughput can be achieved for fast frontends like DPDK > virtio pmd. > > * Better scalability can be achieved that each vhost core can support > more connections because it takes less cycles to handle each single > frontend. > > This patch set contains: > > 1. A Windows VM compatibility fix for vhost enqueue in 16.07 release. > > 2. A baseline patch to rewrite the vhost logic. > > 3. A series of optimization patches added upon the baseline. > > The main optimization techniques are: > > 1. Reorder code to reduce CPU pipeline stall cycles. > > 2. Batch update the used ring for better efficiency. > > 3. Prefetch descriptor to hide cache latency. > > 4. Remove useless volatile attribute to allow compiler optimization. > > Code reordering and batch used ring update bring most of the performance > improvements. > > In the existing code there're 2 callbacks for vhost enqueue: > > * virtio_dev_merge_rx for mrg_rxbuf turned on cases. > > * virtio_dev_rx for mrg_rxbuf turned off cases. > > The performance of the existing code is not optimal, especially when the > mrg_rxbuf feature turned on. Besides, having 2 callback paths increases > maintenance efforts. > > Also, there's a compatibility issue in the existing code which causes > Windows VM to hang when the mrg_rxbuf feature turned on. > > --- > Changes in v6: > > 1. Merge duplicated code. > > 2. Introduce a function for used ring write. > > 3. Add necessary comments. > > --- > Changes in v5: > > 1. Rebase to dpdk-next-virtio master. > > 2. Rename variables to keep consistent in naming style. > > 3. Small changes like return value adjustment and vertical alignment. > > 4. Add details in commit log. > > --- > Changes in v4: > > 1. Fix a Windows VM compatibility issue. > > 2. Free shadow used ring in the right place. > > 3. Add failure check for shadow used ring malloc. > > 4. Refactor the code for clearer logic. > > 5. Add PRINT_PACKET for debugging. > > --- > Changes in v3: > > 1. Remove unnecessary memset which causes frontend stall on SNB & IVB. > > 2. Rename variables to follow naming convention. > > 3. Rewrite enqueue and delete the obsolete in the same patch. > > --- > Changes in v2: > > 1. Split the big function into several small ones. > > 2. Use multiple patches to explain each optimization. > > 3. Add comments. > > Zhihong Wang (6): > vhost: fix windows vm hang > vhost: rewrite enqueue > vhost: remove useless volatile > vhost: add desc prefetch > vhost: batch update used ring > vhost: optimize cache access > > lib/librte_vhost/vhost.c | 20 +- > lib/librte_vhost/vhost.h | 6 +- > lib/librte_vhost/vhost_user.c | 31 ++- > lib/librte_vhost/virtio_net.c | 541 ++++++++++++++---------------------------- > 4 files changed, 225 insertions(+), 373 deletions(-) > > -- > 2.7.4