All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Hogan <james.hogan@imgtec.com>
To: Guenter Roeck <linux@roeck-us.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>,
	Justin Chen <justinpopo6@gmail.com>,
	Justin Chen <justin.chen@broadcom.com>,
	<linux-mips@linux-mips.org>,
	<bcm-kernel-feedback-list@broadcom.com>,
	Ralf Baechle <ralf@linux-mips.org>
Subject: Re: Crash in -next due to 'MIPS: Add cacheinfo support'
Date: Fri, 10 Feb 2017 22:39:32 +0000	[thread overview]
Message-ID: <20170210223932.GA9246@jhogan-linux.le.imgtec.org> (raw)
In-Reply-To: <20170210221152.GA20435@roeck-us.net>

[-- Attachment #1: Type: text/plain, Size: 4217 bytes --]

On Fri, Feb 10, 2017 at 02:11:52PM -0800, Guenter Roeck wrote:
> On Fri, Feb 10, 2017 at 11:15:31AM -0800, Florian Fainelli wrote:
> > On 02/10/2017 09:46 AM, Guenter Roeck wrote:
> > > On Fri, Feb 10, 2017 at 10:39:52AM +0000, James Hogan wrote:
> > >> On Fri, Feb 10, 2017 at 09:40:11AM +0000, James Hogan wrote:
> > >>> Hi Guenter,
> > >>>
> > >>> On Thu, Feb 09, 2017 at 08:50:04PM -0800, Guenter Roeck wrote:
> > >>>> On 02/09/2017 04:01 PM, Justin Chen wrote:
> > >>>>> Hello Guenter,
> > >>>>>
> > >>>>> I am unable to reproduce the problem. May you give me more details?
> > >>>>>
> > >>>> The scripts I am using are available at
> > >>>>
> > >>>> https://github.com/groeck/linux-build-test/tree/master/rootfs
> > >>>>
> > >>>> in the mips and mipsel subdirectories (both crash). Individual
> > >>>> build logs are always available at kerneltests.org/builders,
> > >>>> in the 'next' column.
> > >>>
> > >>> Did you find it 100% reproducible? I was trying to reproduce yesterday
> > >>> evening, and did hit it a few times, but then it stopped happening
> > >>> before I could try and figure out what was going wrong.
> > >>
> > >> I switched to gcc 5.2 (buildroot toolchain) for building kernel, and now
> > >> it reproduces every time :)
> > >>
> > > gcc 5.4.0 (binutils 2.26.1) also reliably crashes. The exact crash depends on
> > > the kernel (-next is different to ToT + offending patch, qemu command line
> > > makes a difference, qemu version makes a difference), but the crash is easy
> > > to reproduce, at least for me.
> > 
> > Just to clarify here, is the crash a combination of:
> > 
> > - the kernel image, based on different trees/branches
> 
> I tried with linux-next, and I tried with ToT with the offending patch
> applied. Both fail reliably. Without the patch, both pass reliably.
> 
> > - different GCC versions
> 
> I can only say that I see it crashes with both gcc 5.2.0 and gcc 5.4.0.
> 
> > - different ways of invoking QEMU/QEMU version?
> > 
> Yes and no. One way of _not_ (or maybe not always) seeing this crash
> is to use the bare-bone command line with qemu 2.7 or 2.8 (with no
> root file system provided). This crashes because there is no root file
> system, but not as described earlier. It does crash, though, when
> providing a more comprehensive command line. All kernel versions
> without this patch do not crash.
> 
> On the other side, blaming this on the qemu command line is akin to
> saying that a crash only seen if a mouse is connected would be caused
> by the mouse, so I am not entirely sure if that helps too much.
> Also see below, regarding heisenbug.
> 
> > and essentially Justin's commit just made problem 1) to occur, but is
> > not the root cause of the crash you are seeing?
> 
> That would not necessarily be my conclusion. Of course, the code appears
> to be heavily SMP related, so it may well be that it exposes some
> problem associated with cache handling or support in non-SMP configurations.
> 
> Of course, it might also be possible that there is a qemu problem somewhere
> which only manifests itself on non-SMP mips images with Justin's commit
> applied. That appears to be somewhat unlikely, though I have no hard data
> supporting this guess.
> 
> I'll do some more testing and try to find the actual crash location.
> Tricky though since it almost looks like there is a not completely
> initialized workqueue. Making things worse, the problem "goes away"
> if I add some debug log into process_one_work(), meaning there may
> be a heisenbug.

cracked it by moving around an early return error. populate_cache()
macro has multiple statements with no do while (0) around it. The
c->scache.waysize condition in populate_cache_leaves then only
conditionalises the first statement in the macro and in absense of l2
(or l3 for that matter) it'll continue to write beyond the end of the
array allocated in detect_cache_attributes(). Badness ensues.

The SMP calls in arch/mips/kernel/cacheinfo.c file are pretty redundant
too since all the cache info is read from the cpu info structures.

I'll write a patch. Thanks for reporting Guenter!

Cheers
James

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

WARNING: multiple messages have this Message-ID (diff)
From: James Hogan <james.hogan@imgtec.com>
To: Guenter Roeck <linux@roeck-us.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>,
	Justin Chen <justinpopo6@gmail.com>,
	Justin Chen <justin.chen@broadcom.com>,
	linux-mips@linux-mips.org, bcm-kernel-feedback-list@broadcom.com,
	Ralf Baechle <ralf@linux-mips.org>
Subject: Re: Crash in -next due to 'MIPS: Add cacheinfo support'
Date: Fri, 10 Feb 2017 22:39:32 +0000	[thread overview]
Message-ID: <20170210223932.GA9246@jhogan-linux.le.imgtec.org> (raw)
Message-ID: <20170210223932.C9rdMOOmbWXRrICB2ix28kbq94psctkx8EkoZlfOB2E@z> (raw)
In-Reply-To: <20170210221152.GA20435@roeck-us.net>

[-- Attachment #1: Type: text/plain, Size: 4217 bytes --]

On Fri, Feb 10, 2017 at 02:11:52PM -0800, Guenter Roeck wrote:
> On Fri, Feb 10, 2017 at 11:15:31AM -0800, Florian Fainelli wrote:
> > On 02/10/2017 09:46 AM, Guenter Roeck wrote:
> > > On Fri, Feb 10, 2017 at 10:39:52AM +0000, James Hogan wrote:
> > >> On Fri, Feb 10, 2017 at 09:40:11AM +0000, James Hogan wrote:
> > >>> Hi Guenter,
> > >>>
> > >>> On Thu, Feb 09, 2017 at 08:50:04PM -0800, Guenter Roeck wrote:
> > >>>> On 02/09/2017 04:01 PM, Justin Chen wrote:
> > >>>>> Hello Guenter,
> > >>>>>
> > >>>>> I am unable to reproduce the problem. May you give me more details?
> > >>>>>
> > >>>> The scripts I am using are available at
> > >>>>
> > >>>> https://github.com/groeck/linux-build-test/tree/master/rootfs
> > >>>>
> > >>>> in the mips and mipsel subdirectories (both crash). Individual
> > >>>> build logs are always available at kerneltests.org/builders,
> > >>>> in the 'next' column.
> > >>>
> > >>> Did you find it 100% reproducible? I was trying to reproduce yesterday
> > >>> evening, and did hit it a few times, but then it stopped happening
> > >>> before I could try and figure out what was going wrong.
> > >>
> > >> I switched to gcc 5.2 (buildroot toolchain) for building kernel, and now
> > >> it reproduces every time :)
> > >>
> > > gcc 5.4.0 (binutils 2.26.1) also reliably crashes. The exact crash depends on
> > > the kernel (-next is different to ToT + offending patch, qemu command line
> > > makes a difference, qemu version makes a difference), but the crash is easy
> > > to reproduce, at least for me.
> > 
> > Just to clarify here, is the crash a combination of:
> > 
> > - the kernel image, based on different trees/branches
> 
> I tried with linux-next, and I tried with ToT with the offending patch
> applied. Both fail reliably. Without the patch, both pass reliably.
> 
> > - different GCC versions
> 
> I can only say that I see it crashes with both gcc 5.2.0 and gcc 5.4.0.
> 
> > - different ways of invoking QEMU/QEMU version?
> > 
> Yes and no. One way of _not_ (or maybe not always) seeing this crash
> is to use the bare-bone command line with qemu 2.7 or 2.8 (with no
> root file system provided). This crashes because there is no root file
> system, but not as described earlier. It does crash, though, when
> providing a more comprehensive command line. All kernel versions
> without this patch do not crash.
> 
> On the other side, blaming this on the qemu command line is akin to
> saying that a crash only seen if a mouse is connected would be caused
> by the mouse, so I am not entirely sure if that helps too much.
> Also see below, regarding heisenbug.
> 
> > and essentially Justin's commit just made problem 1) to occur, but is
> > not the root cause of the crash you are seeing?
> 
> That would not necessarily be my conclusion. Of course, the code appears
> to be heavily SMP related, so it may well be that it exposes some
> problem associated with cache handling or support in non-SMP configurations.
> 
> Of course, it might also be possible that there is a qemu problem somewhere
> which only manifests itself on non-SMP mips images with Justin's commit
> applied. That appears to be somewhat unlikely, though I have no hard data
> supporting this guess.
> 
> I'll do some more testing and try to find the actual crash location.
> Tricky though since it almost looks like there is a not completely
> initialized workqueue. Making things worse, the problem "goes away"
> if I add some debug log into process_one_work(), meaning there may
> be a heisenbug.

cracked it by moving around an early return error. populate_cache()
macro has multiple statements with no do while (0) around it. The
c->scache.waysize condition in populate_cache_leaves then only
conditionalises the first statement in the macro and in absense of l2
(or l3 for that matter) it'll continue to write beyond the end of the
array allocated in detect_cache_attributes(). Badness ensues.

The SMP calls in arch/mips/kernel/cacheinfo.c file are pretty redundant
too since all the cache info is read from the cpu info structures.

I'll write a patch. Thanks for reporting Guenter!

Cheers
James

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

  reply	other threads:[~2017-02-10 22:39 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-08 23:45 Crash in -next due to 'MIPS: Add cacheinfo support' Guenter Roeck
2017-02-10  0:01 ` Justin Chen
2017-02-10  4:50   ` Guenter Roeck
2017-02-10  9:40     ` James Hogan
2017-02-10  9:40       ` James Hogan
2017-02-10 10:39       ` James Hogan
2017-02-10 10:39         ` James Hogan
2017-02-10 17:46         ` Guenter Roeck
2017-02-10 19:15           ` Florian Fainelli
2017-02-10 22:11             ` Guenter Roeck
2017-02-10 22:39               ` James Hogan [this message]
2017-02-10 22:39                 ` James Hogan
2017-02-10 22:43                 ` Florian Fainelli
2017-02-10 22:48                 ` Guenter Roeck
2017-02-10 13:30       ` Guenter Roeck
2017-02-10 23:01 ` [PATCH] MIPS: Fix cacheinfo overflow James Hogan
2017-02-10 23:01   ` James Hogan
2017-02-10 23:10   ` Justin Chen
2017-02-10 23:44     ` James Hogan
2017-02-10 23:44       ` James Hogan
2017-02-10 23:48       ` Justin Chen
2017-02-11  1:25   ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170210223932.GA9246@jhogan-linux.le.imgtec.org \
    --to=james.hogan@imgtec.com \
    --cc=bcm-kernel-feedback-list@broadcom.com \
    --cc=f.fainelli@gmail.com \
    --cc=justin.chen@broadcom.com \
    --cc=justinpopo6@gmail.com \
    --cc=linux-mips@linux-mips.org \
    --cc=linux@roeck-us.net \
    --cc=ralf@linux-mips.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.