From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Yang, Zhiyong" Subject: Re: [PATCH 1/4] eal/common: introduce rte_memset on IA platform Date: Thu, 8 Dec 2016 07:41:43 +0000 Message-ID: References: <1480926387-63838-1-git-send-email-zhiyong.yang@intel.com> <1480926387-63838-2-git-send-email-zhiyong.yang@intel.com> <7223515.9TZuZb6buy@xps13> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Cc: "dev@dpdk.org" , "yuanhan.liu@linux.intel.com" , "Richardson, Bruce" , "Ananyev, Konstantin" , "De Lara Guarch, Pablo" To: Thomas Monjalon Return-path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id A06FC36E for ; Thu, 8 Dec 2016 08:41:48 +0100 (CET) In-Reply-To: <7223515.9TZuZb6buy@xps13> Content-Language: en-US List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" HI, Thomas: Sorry for late reply. I have been being always considering your suggestion= .=20 > -----Original Message----- > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com] > Sent: Friday, December 2, 2016 6:25 PM > To: Yang, Zhiyong > Cc: dev@dpdk.org; yuanhan.liu@linux.intel.com; Richardson, Bruce > ; Ananyev, Konstantin > ; De Lara Guarch, Pablo > > Subject: Re: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset on > IA platform >=20 > 2016-12-05 16:26, Zhiyong Yang: > > +#ifndef _RTE_MEMSET_X86_64_H_ >=20 > Is this implementation specific to 64-bit? >=20 Yes. > > + > > +#define rte_memset memset > > + > > +#else > > + > > +static void * > > +rte_memset(void *dst, int a, size_t n); > > + > > +#endif >=20 > If I understand well, rte_memset (as rte_memcpy) is using the most recent > instructions available (and enabled) when compiling. > It is not adapting the instructions to the run-time CPU. > There is no need to downgrade at run-time the instruction set as it is > obviously not a supported case, but it would be nice to be able to upgrad= e a > "default compilation" at run-time as it is done in rte_acl. > I explain this case more clearly for reference: >=20 > We can have AVX512 supported in the compiler but disable it when compilin= g > (CONFIG_RTE_MACHINE=3Dsnb) in order to build a binary running almost > everywhere. > When running this binary on a CPU having AVX512 support, it will not bene= fit > of the AVX512 improvement. > Though, we can compile an AVX512 version of some functions and use them > only if the running CPU is capable. > This kind of miracle can be achieved in two ways: >=20 > 1/ For generic C code compiled with a recent GCC, a function can be built= for > several CPUs thanks to the attribute target_clones. >=20 > 2/ For manually optimized functions using CPU-specific intrinsics or asm,= it is > possible to build them with non-default flags thanks to the attribute tar= get. >=20 > 3/ For manually optimized files using CPU-specific intrinsics or asm, we = use > specifics flags in the makefile. >=20 > The function clone in case 1/ is dynamically chosen at run-time through i= func > resolver. > The specific functions in cases 2/ and 3/ must chosen at run-time by > initializing a function pointer thanks to rte_cpu_get_flag_enabled(). >=20 > Note that rte_hash and software crypto PMDs have a run-time check with > rte_cpu_get_flag_enabled() but do not override CFLAGS in the Makefile. > Next step for these libraries? >=20 > Back to rte_memset, I think you should try the solution 2/. I have read the ACL code, if I understand well , for complex algo implement= ation, =20 it is good idea, but Choosing functions at run time will bring some overhea= d. For frequently called function Which consumes small cycles, the overhead maybe is more than the gains opt= imizations brings=20 For example, for most applications in dpdk, memset only set N =3D 10 or 12b= ytes. It consumes fewer cycles. Thanks Zhiyong