All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Monjalon <thomas@monjalon.net>
To: Ferruh Yigit <ferruh.yigit@intel.com>
Cc: stable@dpdk.org, Yongseok Koh <yskoh@mellanox.com>,
	keith.wiles@intel.com, dev@dpdk.org, bruce.richardson@intel.com,
	Shahaf Shuler <shahafs@mellanox.com>,
	konstantin.ananyev@intel.com, anatoly.burakov@intel.com,
	justin.parus@microsoft.com,
	"christian.ehrhardt@canonical.com"
	<christian.ehrhardt@canonical.com>,
	"david.coronel@canonical.com" <david.coronel@canonical.com>,
	"josh.powers@canonical.com" <josh.powers@canonical.com>,
	"jay.vosburgh@canonical.com" <jay.vosburgh@canonical.com>,
	"dan.streetman@canonical.com" <dan.streetman@canonical.com>
Subject: Re: [dpdk-stable] AVX512 bug on SkyLake
Date: Fri, 09 Nov 2018 14:17:31 +0100	[thread overview]
Message-ID: <2236912.nQJHooqEX5@xps> (raw)
In-Reply-To: <9891a520-492a-2d18-65e7-c4fcd96a5e3f@intel.com>

09/11/2018 11:03, Ferruh Yigit:
> On 11/8/2018 11:01 PM, Yongseok Koh wrote:
> > 
> >> On Nov 8, 2018, at 9:21 AM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> >>
> >> On 11/8/2018 3:59 PM, Thomas Monjalon wrote:
> >>> Hi,
> >>>
> >>> We need to gather more information about this bug.
> >>> More below.
> >>>
> >>> 07/11/2018 10:04, Wiles, Keith:
> >>>>> On Nov 6, 2018, at 9:30 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> >>>>>> On Nov 5, 2018, at 6:06 AM, Wiles, Keith <keith.wiles@intel.com> wrote:
> >>>>>>> On Nov 2, 2018, at 9:04 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> >>>>>>>
> >>>>>>> This is a workaround to prevent a crash, which might be caused by
> >>>>>>> optimization of newer gcc (7.3.0) on Intel Skylake.
> >>>>>>
> >>>>>> Should the code below not also test for the gcc version and
> >>>>>> the Sky Lake processor, maybe I am wrong but it seems it is
> >>>>>> turning AVX512 for all GCC builds
> >>>>>
> >>>>> I didn't want to check gcc version as 7.3.0 is very new. Only gcc 8 is newly up since then (gcc 8.2).
> >>>>> Also, I wasn't able to test every gcc versions and I wanted to be a bit conservative for this crash.
> >>>>> Performance drop (if any) by disabling a new (experimental) feature would be less risky than unaccountable crash.
> >>>>> And, it does disable the feature only if CONFIG_RTE_ENABLE_AVX512=n. Please refer to v3.
> >>>>
> >>>> Are you not turning off all of the GCC versions for AVX512.
> >>>> And you can test for range or greater then GCC version and
> >>>> it just seems like we are turning off every gcc version, is that true?
> >>>
> >>> Do we know exactly which GCC versions are affected?
> >>>
> >>>>>> Also bug 97 seems a bit obscure reference, maybe you know
> >>>>>> the bug report, but more details would be good?
> >>>>>
> >>>>> I sent out the report to dev list two month ago.
> >>>>> And I created the Bug 97 in order to reference it
> >>>>> in the commit message.
> >>>>> I didn't want to repeat same message here and there,
> >>>>> but it would've been better to have some sort of summary
> >>>>> of the Bug, although v3 has a few more words.
> >>>>> However, v3 has been merged.
> >>>>
> >>>> Still this is too obscure if nothing else give a link to
> >>>> a specific bug not just 97.
> >>>
> >>> The URL is
> >>> 	https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.dpdk.org%2Fshow_bug.cgi%3Fid%3D97&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C90ff6c361faf422b976108d6459eb490%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636772945282345908&amp;sdata=2o%2Fg203aWrKCYg16S6oI4BcS41igpLu1DloS%2FrRnknc%3D&amp;reserved=0
> >>> The bug is also pointing to an email:
> >>> 	https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-September%2F111522.html&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C90ff6c361faf422b976108d6459eb490%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636772945282345908&amp;sdata=NCFKxaREd69iZ8eyFKg%2FWBP73CLTXkxrNQQeii%2Bbsao%3D&amp;reserved=0
> >>>
> >>> Summary:
> >>> 	- CPU: Intel Skylake
> >>> 	- Linux environment: Ubuntu 18.04
> >>> 	- Compiler: gcc-7.3 (Ubuntu 7.3.0-16ubuntu3)
> >>
> >> Is it possible to test a few other gcc versions to check if the issue is
> >> specific to this compiler version?
> > 
> > Nothing's impossible but even with my quick search in gcc.gnu.org,
> > I could find the following documents mention mavx512f support:
> > 
> > GCC 4.9.0
> > April 22, 2014 (changes, documentation)
> >  
> > GCC 5.1
> > April 22, 2015 (changes, documentation)
> >  
> > GCC 6.4
> > July 4, 2017 (changes, documentation)
> >  
> > GCC 7.1
> > May 2, 2017 (changes, documentation)
> >  
> > GCC 8.1
> > May 2, 2018 (changes, documentation)
> > 
> > We altogether have to put quite large resource to verify all of the versions.
> >  
> > I assumed older than gcc 7 would have the same issue. I know it was a speculation
> > but like I mentioned I wanted to be more conservative. I didn't mean this is a permanent fix.
> > For two months, we couldn't have any tangible solution (actually nobody cared including myself),
> > so I submitted the patch to temporarily disable mavx512f.
> > 
> > I'm still not sure what the best option is...
> 
> For permanent fix we need more information, currently we can't re-produce this
> defect. Since you can reproduce it we need your support.
> 
> Right now we don't know if this is compiler issue or code defect in rte_memcpy()
> or something else.
> 
> It is easy to disable mavx512f as temporarily solution but it is coming with the
> cost of the performance drop, also without knowing the actual root cause I
> wouldn't say this is being conservative, actual issue may be just hidden with
> this change.
> 
> I think as first thing we need to find a way to reproduce this issue in any
> other way than using mlx5 PMD. So that we can put more organized effort to fix this.
> I attached a simple unit test for rte_memcpy(), if this is a rte_memcpy() with
> avx512f defect as claimed, you should be able to see the issue with that, right?
> Did you able to find a chance to test it? Do you observer any crash there?

I am able to connect to a machine where the issue is reproduced.
So I have tested replacing rte_memcpy with memcpy,
and the crash disappears when using memcpy.
So it confirms that the issue is in rte_memcpy.

About the unit test you attached in bugzilla:
	https://bugs.dpdk.org/attachment.cgi?id=15
It does not reproduce the bug:
	RTE>>rte_memcpy_autotest
	.................................................................
	Test OK

  reply	other threads:[~2018-11-09 13:17 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-23 21:23 [PATCH] build: disable compiler AVX512F support Yongseok Koh
2018-11-01 23:11 ` Thomas Monjalon
2018-11-02 12:42 ` [dpdk-stable] " Ferruh Yigit
2018-11-02 13:48   ` Ferruh Yigit
2018-11-02 20:59     ` Yongseok Koh
2018-11-02 21:46       ` Ferruh Yigit
2018-11-02 23:31         ` Yongseok Koh
2018-11-02 21:04 ` [PATCH v2] " Yongseok Koh
2018-11-05 14:06   ` Wiles, Keith
2018-11-06 21:30     ` Yongseok Koh
2018-11-07  9:04       ` Wiles, Keith
2018-11-08 15:59         ` AVX512 bug on SkyLake Thomas Monjalon
2018-11-08 17:21           ` Ferruh Yigit
2018-11-08 23:01             ` Yongseok Koh
2018-11-09  6:27               ` Christian Ehrhardt
2018-11-09  9:49                 ` Ferruh Yigit
2018-11-09 11:35                   ` Thomas Monjalon
2018-11-09 10:03               ` Ferruh Yigit
2018-11-09 13:17                 ` Thomas Monjalon [this message]
2018-11-09 14:27                   ` [dpdk-stable] " Thomas Monjalon
2018-11-09 20:06                     ` Ferruh Yigit
2018-11-09 18:46           ` Stephen Hemminger
2018-11-10  2:13           ` [dpdk-stable] " Thomas Monjalon
2018-11-11 14:15             ` Ananyev, Konstantin
2018-11-11 18:15               ` Thomas Monjalon
2018-11-12  9:09                 ` Christian Ehrhardt
2018-11-12  9:21                   ` Thomas Monjalon
2018-11-12  9:26                 ` Ananyev, Konstantin
2018-11-03  1:06 ` [PATCH v3] build: disable gcc AVX512F support Yongseok Koh
2018-11-04 20:56   ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2236912.nQJHooqEX5@xps \
    --to=thomas@monjalon.net \
    --cc=anatoly.burakov@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=christian.ehrhardt@canonical.com \
    --cc=dan.streetman@canonical.com \
    --cc=david.coronel@canonical.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@intel.com \
    --cc=jay.vosburgh@canonical.com \
    --cc=josh.powers@canonical.com \
    --cc=justin.parus@microsoft.com \
    --cc=keith.wiles@intel.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=shahafs@mellanox.com \
    --cc=stable@dpdk.org \
    --cc=yskoh@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.