From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yongseok Koh Subject: Re: AVX512 bug on SkyLake Date: Thu, 8 Nov 2018 23:01:03 +0000 Message-ID: References: <20181023212318.43082-1-yskoh@mellanox.com> <432F92CE-5714-45DC-B72F-CD8771DAFC89@intel.com> <1612642.At0RDolh7h@xps> <9d3f48fc-5a47-c813-1da8-7e1cab6bdd9e@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Cc: Thomas Monjalon , "Wiles, Keith" , dev , "Richardson, Bruce" , Shahaf Shuler , "konstantin.ananyev@intel.com" , "anatoly.burakov@intel.com" , "stable@dpdk.org" , "justin.parus@microsoft.com" , "christian.ehrhardt@canonical.com" , "david.coronel@canonical.com" , "josh.powers@canonical.com" , "jay.vosburgh@canonical.com" , "dan.streetman@canonical.com" To: Ferruh Yigit Return-path: In-Reply-To: <9d3f48fc-5a47-c813-1da8-7e1cab6bdd9e@intel.com> Content-Language: en-US Content-ID: <87B9C8FA2ECA744BB4D8B5D2D00032E4@eurprd05.prod.outlook.com> List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > On Nov 8, 2018, at 9:21 AM, Ferruh Yigit wrote: >=20 > On 11/8/2018 3:59 PM, Thomas Monjalon wrote: >> Hi, >>=20 >> We need to gather more information about this bug. >> More below. >>=20 >> 07/11/2018 10:04, Wiles, Keith: >>>> On Nov 6, 2018, at 9:30 PM, Yongseok Koh wrote: >>>>> On Nov 5, 2018, at 6:06 AM, Wiles, Keith wrot= e: >>>>>> On Nov 2, 2018, at 9:04 PM, Yongseok Koh wrote: >>>>>>=20 >>>>>> This is a workaround to prevent a crash, which might be caused by >>>>>> optimization of newer gcc (7.3.0) on Intel Skylake. >>>>>=20 >>>>> Should the code below not also test for the gcc version and >>>>> the Sky Lake processor, maybe I am wrong but it seems it is >>>>> turning AVX512 for all GCC builds >>>>=20 >>>> I didn't want to check gcc version as 7.3.0 is very new. Only gcc 8 is= newly up since then (gcc 8.2). >>>> Also, I wasn't able to test every gcc versions and I wanted to be a bi= t conservative for this crash. >>>> Performance drop (if any) by disabling a new (experimental) feature wo= uld be less risky than unaccountable crash. >>>> And, it does disable the feature only if CONFIG_RTE_ENABLE_AVX512=3Dn.= Please refer to v3. >>>=20 >>> Are you not turning off all of the GCC versions for AVX512. >>> And you can test for range or greater then GCC version and >>> it just seems like we are turning off every gcc version, is that true? >>=20 >> Do we know exactly which GCC versions are affected? >>=20 >>>>> Also bug 97 seems a bit obscure reference, maybe you know >>>>> the bug report, but more details would be good? >>>>=20 >>>> I sent out the report to dev list two month ago. >>>> And I created the Bug 97 in order to reference it >>>> in the commit message. >>>> I didn't want to repeat same message here and there, >>>> but it would've been better to have some sort of summary >>>> of the Bug, although v3 has a few more words. >>>> However, v3 has been merged. >>>=20 >>> Still this is too obscure if nothing else give a link to >>> a specific bug not just 97. >>=20 >> The URL is >> https://emea01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fbu= gs.dpdk.org%2Fshow_bug.cgi%3Fid%3D97&data=3D02%7C01%7Cyskoh%40mellanox.= com%7C90ff6c361faf422b976108d6459eb490%7Ca652971c7d2e4d9ba6a4d149256f461b%7= C0%7C0%7C636772945282345908&sdata=3D2o%2Fg203aWrKCYg16S6oI4BcS41igpLu1D= loS%2FrRnknc%3D&reserved=3D0 >> The bug is also pointing to an email: >> https://emea01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fma= ils.dpdk.org%2Farchives%2Fdev%2F2018-September%2F111522.html&data=3D02%= 7C01%7Cyskoh%40mellanox.com%7C90ff6c361faf422b976108d6459eb490%7Ca652971c7d= 2e4d9ba6a4d149256f461b%7C0%7C0%7C636772945282345908&sdata=3DNCFKxaREd69= iZ8eyFKg%2FWBP73CLTXkxrNQQeii%2Bbsao%3D&reserved=3D0 >>=20 >> Summary: >> - CPU: Intel Skylake >> - Linux environment: Ubuntu 18.04 >> - Compiler: gcc-7.3 (Ubuntu 7.3.0-16ubuntu3) >=20 > Is it possible to test a few other gcc versions to check if the issue is > specific to this compiler version? Nothing's impossible but even with my quick search in gcc.gnu.org, I could find the following documents mention mavx512f support: GCC 4.9.0 April 22, 2014 (changes, documentation) =20 GCC 5.1 April 22, 2015 (changes, documentation) =20 GCC 6.4 July 4, 2017 (changes, documentation) =20 GCC 7.1 May 2, 2017 (changes, documentation) =20 GCC 8.1 May 2, 2018 (changes, documentation) We altogether have to put quite large resource to verify all of the version= s. =20 I assumed older than gcc 7 would have the same issue. I know it was a specu= lation but like I mentioned I wanted to be more conservative. I didn't mean this i= s a permanent fix. For two months, we couldn't have any tangible solution (actually nobody car= ed including myself), so I submitted the patch to temporarily disable mavx512f. I'm still not sure what the best option is... Thanks, Yongseok >=20 >> - Scenario: testpmd crashes when it starts forwarding >> - Behaviour: AVX2 version of rte_memcpy() optimized with 512b instructi= ons >> - Fix: disable AVX512 optimization with -mno-avx512f >>=20 >> It seems to have been reproduced only when using mlx5 PMD so far. >> Any other experience?