From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stephen Hemminger Subject: Re: [PATCH 0/4] Optimize memcpy for AVX512 platforms Date: Thu, 14 Jan 2016 08:48:32 -0800 Message-ID: <20160114084832.672fac86@xeon-e3> References: <1452752002-107586-1-git-send-email-zhihong.wang@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: dev@dpdk.org To: Zhihong Wang Return-path: Received: from mail-pa0-f50.google.com (mail-pa0-f50.google.com [209.85.220.50]) by dpdk.org (Postfix) with ESMTP id 4F4BE379E for ; Thu, 14 Jan 2016 17:48:25 +0100 (CET) Received: by mail-pa0-f50.google.com with SMTP id yy13so286840344pab.3 for ; Thu, 14 Jan 2016 08:48:25 -0800 (PST) In-Reply-To: <1452752002-107586-1-git-send-email-zhihong.wang@intel.com> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Thu, 14 Jan 2016 01:13:18 -0500 Zhihong Wang wrote: > This patch set optimizes DPDK memcpy for AVX512 platforms, to make full > utilization of hardware resources and deliver high performance. > > In current DPDK, memcpy holds a large proportion of execution time in > libs like Vhost, especially for large packets, and this patch can bring > considerable benefits. > > The implementation is based on the current DPDK memcpy framework, some > background introduction can be found in these threads: > http://dpdk.org/ml/archives/dev/2014-November/008158.html > http://dpdk.org/ml/archives/dev/2015-January/011800.html > > Code changes are: > > 1. Read CPUID to check if AVX512 is supported by CPU > > 2. Predefine AVX512 macro if AVX512 is enabled by compiler > > 3. Implement AVX512 memcpy and choose the right implementation based on > predefined macros > > 4. Decide alignment unit for memcpy perf test based on predefined macros > > Zhihong Wang (4): > lib/librte_eal: Identify AVX512 CPU flag > mk: Predefine AVX512 macro for compiler > lib/librte_eal: Optimize memcpy for AVX512 platforms > app/test: Adjust alignment unit for memcpy perf test > > app/test/test_memcpy_perf.c | 6 + > .../common/include/arch/x86/rte_cpuflags.h | 2 + > .../common/include/arch/x86/rte_memcpy.h | 247 ++++++++++++++++++++- > mk/rte.cpuflags.mk | 4 + > 4 files changed, 255 insertions(+), 4 deletions(-) > This really looks like code that could benefit from Gcc function multiversioning. The current cpuflags model is useless/flawed in real product deployment