From mboxrd@z Thu Jan  1 00:00:00 1970
From: Stephen Hemminger <stephen@networkplumber.org>
Subject: Re: [PATCH 0/4] Optimize memcpy for AVX512 platforms
Date: Thu, 14 Jan 2016 08:48:32 -0800
Message-ID: <20160114084832.672fac86@xeon-e3>
References: <1452752002-107586-1-git-send-email-zhihong.wang@intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Cc: dev@dpdk.org
To: Zhihong Wang <zhihong.wang@intel.com>
Return-path: <dev-bounces@dpdk.org>
Received: from mail-pa0-f50.google.com (mail-pa0-f50.google.com
 [209.85.220.50]) by dpdk.org (Postfix) with ESMTP id 4F4BE379E
 for <dev@dpdk.org>; Thu, 14 Jan 2016 17:48:25 +0100 (CET)
Received: by mail-pa0-f50.google.com with SMTP id yy13so286840344pab.3
 for <dev@dpdk.org>; Thu, 14 Jan 2016 08:48:25 -0800 (PST)
In-Reply-To: <1452752002-107586-1-git-send-email-zhihong.wang@intel.com>
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On Thu, 14 Jan 2016 01:13:18 -0500
Zhihong Wang <zhihong.wang@intel.com> wrote:

> This patch set optimizes DPDK memcpy for AVX512 platforms, to make full
> utilization of hardware resources and deliver high performance.
> 
> In current DPDK, memcpy holds a large proportion of execution time in
> libs like Vhost, especially for large packets, and this patch can bring
> considerable benefits.
> 
> The implementation is based on the current DPDK memcpy framework, some
> background introduction can be found in these threads:
> http://dpdk.org/ml/archives/dev/2014-November/008158.html
> http://dpdk.org/ml/archives/dev/2015-January/011800.html
> 
> Code changes are:
> 
>   1. Read CPUID to check if AVX512 is supported by CPU
> 
>   2. Predefine AVX512 macro if AVX512 is enabled by compiler
> 
>   3. Implement AVX512 memcpy and choose the right implementation based on
>      predefined macros
> 
>   4. Decide alignment unit for memcpy perf test based on predefined macros
> 
> Zhihong Wang (4):
>   lib/librte_eal: Identify AVX512 CPU flag
>   mk: Predefine AVX512 macro for compiler
>   lib/librte_eal: Optimize memcpy for AVX512 platforms
>   app/test: Adjust alignment unit for memcpy perf test
> 
>  app/test/test_memcpy_perf.c                        |   6 +
>  .../common/include/arch/x86/rte_cpuflags.h         |   2 +
>  .../common/include/arch/x86/rte_memcpy.h           | 247 ++++++++++++++++++++-
>  mk/rte.cpuflags.mk                                 |   4 +
>  4 files changed, 255 insertions(+), 4 deletions(-)
> 

This really looks like code that could benefit from Gcc
function multiversioning. The current cpuflags model is useless/flawed
in real product deployment