From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753701AbdJRNQB (ORCPT ); Wed, 18 Oct 2017 09:16:01 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:56479 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753670AbdJRNPv (ORCPT ); Wed, 18 Oct 2017 09:15:51 -0400 Date: Wed, 18 Oct 2017 15:15:03 +0200 (CEST) From: Thomas Gleixner To: Linus Torvalds cc: Joonsoo Kim , Josh Poimboeuf , kernel test robot , Ingo Molnar , Andy Lutomirski , Borislav Petkov , Brian Gerst , Denys Vlasenko , "H. Peter Anvin" , Jiri Slaby , Mike Galbraith , Peter Zijlstra , LKML , LKP , linux-mm , Pekka Enberg , David Rientjes , Andrew Morton , Christoph Lameter Subject: Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel In-Reply-To: Message-ID: References: <20171010121513.GC5445@yexl-desktop> <20171011023106.izaulhwjcoam55jt@treble> <20171011170120.7flnk6r77dords7a@treble> <20171017073326.GA23865@js1304-P5Q-DELUXE> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 18 Oct 2017, Linus Torvalds wrote: > On Tue, Oct 17, 2017 at 3:33 AM, Joonsoo Kim wrote: > > > > It looks like a compiler bug. The code of slob_units() try to read two > > bytes at ffff88001c4afffe. It's valid. But the compiler generates > > wrong code that try to read four bytes. > > > > static slobidx_t slob_units(slob_t *s) > > { > > if (s->units > 0) > > return s->units; > > return 1; > > } > > > > s->units is defined as two bytes in this setup. > > > > Wrongly generated code for this part. > > > > 'mov 0x0(%rbp), %ebp' > > > > %ebp is four bytes. > > > > I guess that this wrong four bytes read cross over the valid memory > > boundary and this issue happend. > > Hmm. I can see why the compiler would do that (16-bit accesses are > slow), but it's definitely wrong. > > Does it work ok if that slob_units() code is written as > > static slobidx_t slob_units(slob_t *s) > { > int units = READ_ONCE(s->units); > > if (units > 0) > return units; > return 1; > } > > which might be an acceptable workaround for now? Discussed exactly that with Peter Zijlstra yesterday, but we came to the conclusion that this is a whack a mole game. It might fix this slob issue, but what guarantees that we don't have the same problem in some other place? Just duct taping this particular instance makes me nervous. Joonsoo says: > gcc 4.8 and 4.9 fails to generate proper code. gcc 5.1 and > the latest version works fine. > I guess that this problem is related to the corner case of some > optimization feature since minor code change makes the result > different. And, with -O2, proper code is generated even if gcc 4.8 is > used. So it would be useful to figure out which optimization bit is causing that and blacklist it for the affected compiler versions. Thanks, tglx From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f72.google.com (mail-wm0-f72.google.com [74.125.82.72]) by kanga.kvack.org (Postfix) with ESMTP id 88C796B0253 for ; Wed, 18 Oct 2017 09:15:58 -0400 (EDT) Received: by mail-wm0-f72.google.com with SMTP id s78so2164539wmd.14 for ; Wed, 18 Oct 2017 06:15:58 -0700 (PDT) Received: from Galois.linutronix.de (Galois.linutronix.de. [2a01:7a0:2:106d:700::1]) by mx.google.com with ESMTPS id 81si8660091wmn.49.2017.10.18.06.15.56 for (version=TLS1_2 cipher=AES128-SHA bits=128/128); Wed, 18 Oct 2017 06:15:56 -0700 (PDT) Date: Wed, 18 Oct 2017 15:15:03 +0200 (CEST) From: Thomas Gleixner Subject: Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel In-Reply-To: Message-ID: References: <20171010121513.GC5445@yexl-desktop> <20171011023106.izaulhwjcoam55jt@treble> <20171011170120.7flnk6r77dords7a@treble> <20171017073326.GA23865@js1304-P5Q-DELUXE> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Joonsoo Kim , Josh Poimboeuf , kernel test robot , Ingo Molnar , Andy Lutomirski , Borislav Petkov , Brian Gerst , Denys Vlasenko , "H. Peter Anvin" , Jiri Slaby , Mike Galbraith , Peter Zijlstra , LKML , LKP , linux-mm , Pekka Enberg , David Rientjes , Andrew Morton , Christoph Lameter On Wed, 18 Oct 2017, Linus Torvalds wrote: > On Tue, Oct 17, 2017 at 3:33 AM, Joonsoo Kim wrote: > > > > It looks like a compiler bug. The code of slob_units() try to read two > > bytes at ffff88001c4afffe. It's valid. But the compiler generates > > wrong code that try to read four bytes. > > > > static slobidx_t slob_units(slob_t *s) > > { > > if (s->units > 0) > > return s->units; > > return 1; > > } > > > > s->units is defined as two bytes in this setup. > > > > Wrongly generated code for this part. > > > > 'mov 0x0(%rbp), %ebp' > > > > %ebp is four bytes. > > > > I guess that this wrong four bytes read cross over the valid memory > > boundary and this issue happend. > > Hmm. I can see why the compiler would do that (16-bit accesses are > slow), but it's definitely wrong. > > Does it work ok if that slob_units() code is written as > > static slobidx_t slob_units(slob_t *s) > { > int units = READ_ONCE(s->units); > > if (units > 0) > return units; > return 1; > } > > which might be an acceptable workaround for now? Discussed exactly that with Peter Zijlstra yesterday, but we came to the conclusion that this is a whack a mole game. It might fix this slob issue, but what guarantees that we don't have the same problem in some other place? Just duct taping this particular instance makes me nervous. Joonsoo says: > gcc 4.8 and 4.9 fails to generate proper code. gcc 5.1 and > the latest version works fine. > I guess that this problem is related to the corner case of some > optimization feature since minor code change makes the result > different. And, with -O2, proper code is generated even if gcc 4.8 is > used. So it would be useful to figure out which optimization bit is causing that and blacklist it for the affected compiler versions. Thanks, tglx -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============1096234841993702531==" MIME-Version: 1.0 From: Thomas Gleixner To: lkp@lists.01.org Subject: Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel Date: Wed, 18 Oct 2017 15:15:03 +0200 Message-ID: In-Reply-To: List-Id: --===============1096234841993702531== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On Wed, 18 Oct 2017, Linus Torvalds wrote: > On Tue, Oct 17, 2017 at 3:33 AM, Joonsoo Kim w= rote: > > > > It looks like a compiler bug. The code of slob_units() try to read two > > bytes at ffff88001c4afffe. It's valid. But the compiler generates > > wrong code that try to read four bytes. > > > > static slobidx_t slob_units(slob_t *s) > > { > > if (s->units > 0) > > return s->units; > > return 1; > > } > > > > s->units is defined as two bytes in this setup. > > > > Wrongly generated code for this part. > > > > 'mov 0x0(%rbp), %ebp' > > > > %ebp is four bytes. > > > > I guess that this wrong four bytes read cross over the valid memory > > boundary and this issue happend. > = > Hmm. I can see why the compiler would do that (16-bit accesses are > slow), but it's definitely wrong. > = > Does it work ok if that slob_units() code is written as > = > static slobidx_t slob_units(slob_t *s) > { > int units =3D READ_ONCE(s->units); > = > if (units > 0) > return units; > return 1; > } > = > which might be an acceptable workaround for now? Discussed exactly that with Peter Zijlstra yesterday, but we came to the conclusion that this is a whack a mole game. It might fix this slob issue, but what guarantees that we don't have the same problem in some other place? Just duct taping this particular instance makes me nervous. Joonsoo says: > gcc 4.8 and 4.9 fails to generate proper code. gcc 5.1 and > the latest version works fine. > I guess that this problem is related to the corner case of some > optimization feature since minor code change makes the result > different. And, with -O2, proper code is generated even if gcc 4.8 is > used. So it would be useful to figure out which optimization bit is causing that and blacklist it for the affected compiler versions. Thanks, tglx --===============1096234841993702531==--