All of lore.kernel.org
 help / color / mirror / Atom feed
From: Guillaume Tucker <guillaume.tucker@collabora.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.com>, Mark Brown <broonie@kernel.org>,
	Tomeu Vizoso <tomeu.vizoso@collabora.com>,
	Matt Hart <matthew.hart@linaro.org>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	khilman@baylibre.com, enric.balletbo@collabora.com,
	Nicholas Piggin <npiggin@gmail.com>,
	Dominik Brodowski <linux@dominikbrodowski.net>,
	Masahiro Yamada <yamada.masahiro@socionext.com>,
	Kees Cook <keescook@chromium.org>, Adrian Reber <adrian@lisas.de>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Linux MM <linux-mm@kvack.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Richard Guy Briggs <rgb@redhat.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	info@kernelci.org
Subject: Re: next/master boot bisection: next-20190215 on beaglebone-black
Date: Wed, 6 Mar 2019 10:14:47 +0000	[thread overview]
Message-ID: <36faea07-139c-b97d-3585-f7d6d362abc3@collabora.com> (raw)
In-Reply-To: <CAPcyv4hMNiiM11ULjbOnOf=9N=yCABCRsAYLpjXs+98bRoRpCA@mail.gmail.com>

On 01/03/2019 23:23, Dan Williams wrote:
> On Fri, Mar 1, 2019 at 1:05 PM Guillaume Tucker
> <guillaume.tucker@collabora.com> wrote:
>>
>> On 01/03/2019 20:41, Andrew Morton wrote:
>>> On Fri, 1 Mar 2019 09:25:24 +0100 Guillaume Tucker <guillaume.tucker@collabora.com> wrote:
>>>
>>>>>>> Michal had asked if the free space accounting fix up addressed this
>>>>>>> boot regression? I was awaiting word on that.
>>>>>>
>>>>>> hm, does bot@kernelci.org actually read emails?  Let's try info@ as well..
>>>>
>>>> bot@kernelci.org is not person, it's a send-only account for
>>>> automated reports.  So no, it doesn't read emails.
>>>>
>>>> I guess the tricky point here is that the authors of the commits
>>>> found by bisections may not always have the hardware needed to
>>>> reproduce the problem.  So it needs to be dealt with on a
>>>> case-by-case basis: sometimes they do have the hardware,
>>>> sometimes someone else on the list or on CC does, and sometimes
>>>> it's better for the people who have access to the test lab which
>>>> ran the KernelCI test to deal with it.
>>>>
>>>> This case seems to fall into the last category.  As I have access
>>>> to the Collabora lab, I can do some quick checks to confirm
>>>> whether the proposed patch does fix the issue.  I hadn't realised
>>>> that someone was waiting for this to happen, especially as the
>>>> BeagleBone Black is a very common platform.  Sorry about that,
>>>> I'll take a look today.
>>>>
>>>> It may be a nice feature to be able to give access to the
>>>> KernelCI test infrastructure to anyone who wants to debug an
>>>> issue reported by KernelCI or verify a fix, so they won't need to
>>>> have the hardware locally.  Something to think about for the
>>>> future.
>>>
>>> Thanks, that all sounds good.
>>>
>>>>>> Is it possible to determine whether this regression is still present in
>>>>>> current linux-next?
>>>>
>>>> I'll try to re-apply the patch that caused the issue, then see if
>>>> the suggested change fixes it.  As far as the current linux-next
>>>> master branch is concerned, KernelCI boot tests are passing fine
>>>> on that platform.
>>>
>>> They would, because I dropped
>>> mm-shuffle-default-enable-all-shuffling.patch, so your tests presumably
>>> now have shuffling disabled.
>>>
>>> Is it possible to add the below to linux-next and try again?
>>
>> I've actually already done that, and essentially the issue can
>> still be reproduced by applying that patch.  See this branch:
>>
>>   https://gitlab.collabora.com/gtucker/linux/commits/next-20190301-beaglebone-black-debug
>>
>> next-20190301 boots fine but the head fails, using
>> multi_v7_defconfig + SMP=n in both cases and
>> SHUFFLE_PAGE_ALLOCATOR=y enabled in the 2nd case as a result
>> of the change in the default value.
>>
>> The change suggested by Michal Hocko on Feb 15th has now been
>> applied in linux-next, it's part of this commit but as
>> explained above it does not actually resolve the boot failure:
>>
>>   98cf198ee8ce mm: move buddy list manipulations into helpers
>>
>> I can send more details on Monday and do a bit of debugging to
>> help narrowing down the problem.  Please let me know if
>> there's anything in particular that would seem be worth
>> trying.
>>
> 
> Thanks for taking a look!
> 
> Some questions when you get a chance:
> 
> Is there an early-printk facility that can be turned on to see how far
> we get in the boot?

Yes, I've done that now by enabling CONFIG_DEBUG_AM33XXUART1 and
earlyprintk in the command line.  Here's the result, with the
commit cherry picked on top of next-20190304:

  https://lava.collabora.co.uk/scheduler/job/1526326

[    1.379522] ti-sysc 4804a000.target-module: sysc_flags 00000222 != 00000022
[    1.396718] Unable to handle kernel paging request at virtual address 77bb4003
[    1.404203] pgd = (ptrval)
[    1.406971] [77bb4003] *pgd=00000000
[    1.410650] Internal error: Oops: 5 [#1] ARM
[...]
[    1.672310] [<c07051a0>] (clk_hw_create_clk.part.21) from [<c06fea34>] (devm_clk_get+0x4c/0x80)
[    1.681232] [<c06fea34>] (devm_clk_get) from [<c064253c>] (sysc_probe+0x28c/0xde4)

It's always failing at that point in the code.  Also when
enabling "debug" on the kernel command line, the issue goes
away (exact same binaries etc..):

  https://lava.collabora.co.uk/scheduler/job/1526327

For the record, here's the branch I've been using:

  https://gitlab.collabora.com/gtucker/linux/tree/beaglebone-black-next-20190304-debug

The board otherwise boots fine with next-20190304 (SMP=n), and
also with the patch applied but the shuffle configs set to n.

> Do any of the QEMU machine types [1] approximate this board? I.e. so I
> might be able to independently debug.

Unfortunately there doesn't appear to be any QEMU machine
emulating the TI AM335x SoC or the BeagleBone Black board.

> Were there any boot *successes* on ARM with shuffling enabled? I.e.
> clues about what's different about the specific memory setup for
> beagle-bone-black.

Looking at the KernelCI results from next-20190215, it looks like
only the BeagleBone Black with SMP=n failed to boot:

  https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20190215/

Of course that's not all the ARM boards that exist out there, but
it's a fairly large coverage already.

As the kernel panic always seems to originate in ti-sysc.c,
there's a chance it's only visible on that platform...  I'm doing
a KernelCI run now with my test branch to double check that,
it'll take a few hours so I'll send an update later if I get
anything useful out of it.

In the meantime, I'm happy to try out other things with more
debug configs turned on or any potential fixes someone might
have.

Thanks,
Guillaume

> Thanks for the help!
> 
> [1]: https://wiki.qemu.org/Documentation/Platforms/ARM



WARNING: multiple messages have this Message-ID (diff)
From: "Guillaume Tucker" <guillaume.tucker@gmail.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.com>, Mark Brown <broonie@kernel.org>,
	Tomeu Vizoso <tomeu.vizoso@collabora.com>,
	Matt Hart <matthew.hart@linaro.org>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	khilman@baylibre.com, enric.balletbo@collabora.com,
	Nicholas Piggin <npiggin@gmail.com>,
	Dominik Brodowski <linux@dominikbrodowski.net>,
	Masahiro Yamada <yamada.masahiro@socionext.com>,
	Kees Cook <keescook@chromium.org>, Adrian Reber <adrian@lisas.de>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Linux MM <linux-mm@kvack.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Richard Guy Briggs <rgb@redhat.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	info@kernelci.org
Subject: Re: next/master boot bisection: next-20190215 on beaglebone-black
Date: Wed, 6 Mar 2019 10:14:47 +0000	[thread overview]
Message-ID: <36faea07-139c-b97d-3585-f7d6d362abc3@collabora.com> (raw)
In-Reply-To: <CAPcyv4hMNiiM11ULjbOnOf=9N=yCABCRsAYLpjXs+98bRoRpCA@mail.gmail.com>

On 01/03/2019 23:23, Dan Williams wrote:
> On Fri, Mar 1, 2019 at 1:05 PM Guillaume Tucker
> <guillaume.tucker@collabora.com> wrote:
>>
>> On 01/03/2019 20:41, Andrew Morton wrote:
>>> On Fri, 1 Mar 2019 09:25:24 +0100 Guillaume Tucker <guillaume.tucker@collabora.com> wrote:
>>>
>>>>>>> Michal had asked if the free space accounting fix up addressed this
>>>>>>> boot regression? I was awaiting word on that.
>>>>>>
>>>>>> hm, does bot@kernelci.org actually read emails?  Let's try info@ as well..
>>>>
>>>> bot@kernelci.org is not person, it's a send-only account for
>>>> automated reports.  So no, it doesn't read emails.
>>>>
>>>> I guess the tricky point here is that the authors of the commits
>>>> found by bisections may not always have the hardware needed to
>>>> reproduce the problem.  So it needs to be dealt with on a
>>>> case-by-case basis: sometimes they do have the hardware,
>>>> sometimes someone else on the list or on CC does, and sometimes
>>>> it's better for the people who have access to the test lab which
>>>> ran the KernelCI test to deal with it.
>>>>
>>>> This case seems to fall into the last category.  As I have access
>>>> to the Collabora lab, I can do some quick checks to confirm
>>>> whether the proposed patch does fix the issue.  I hadn't realised
>>>> that someone was waiting for this to happen, especially as the
>>>> BeagleBone Black is a very common platform.  Sorry about that,
>>>> I'll take a look today.
>>>>
>>>> It may be a nice feature to be able to give access to the
>>>> KernelCI test infrastructure to anyone who wants to debug an
>>>> issue reported by KernelCI or verify a fix, so they won't need to
>>>> have the hardware locally.  Something to think about for the
>>>> future.
>>>
>>> Thanks, that all sounds good.
>>>
>>>>>> Is it possible to determine whether this regression is still present in
>>>>>> current linux-next?
>>>>
>>>> I'll try to re-apply the patch that caused the issue, then see if
>>>> the suggested change fixes it.  As far as the current linux-next
>>>> master branch is concerned, KernelCI boot tests are passing fine
>>>> on that platform.
>>>
>>> They would, because I dropped
>>> mm-shuffle-default-enable-all-shuffling.patch, so your tests presumably
>>> now have shuffling disabled.
>>>
>>> Is it possible to add the below to linux-next and try again?
>>
>> I've actually already done that, and essentially the issue can
>> still be reproduced by applying that patch.  See this branch:
>>
>>   https://gitlab.collabora.com/gtucker/linux/commits/next-20190301-beaglebone-black-debug
>>
>> next-20190301 boots fine but the head fails, using
>> multi_v7_defconfig + SMP=n in both cases and
>> SHUFFLE_PAGE_ALLOCATOR=y enabled in the 2nd case as a result
>> of the change in the default value.
>>
>> The change suggested by Michal Hocko on Feb 15th has now been
>> applied in linux-next, it's part of this commit but as
>> explained above it does not actually resolve the boot failure:
>>
>>   98cf198ee8ce mm: move buddy list manipulations into helpers
>>
>> I can send more details on Monday and do a bit of debugging to
>> help narrowing down the problem.  Please let me know if
>> there's anything in particular that would seem be worth
>> trying.
>>
> 
> Thanks for taking a look!
> 
> Some questions when you get a chance:
> 
> Is there an early-printk facility that can be turned on to see how far
> we get in the boot?

Yes, I've done that now by enabling CONFIG_DEBUG_AM33XXUART1 and
earlyprintk in the command line.  Here's the result, with the
commit cherry picked on top of next-20190304:

  https://lava.collabora.co.uk/scheduler/job/1526326

[    1.379522] ti-sysc 4804a000.target-module: sysc_flags 00000222 != 00000022
[    1.396718] Unable to handle kernel paging request at virtual address 77bb4003
[    1.404203] pgd = (ptrval)
[    1.406971] [77bb4003] *pgd=00000000
[    1.410650] Internal error: Oops: 5 [#1] ARM
[...]
[    1.672310] [<c07051a0>] (clk_hw_create_clk.part.21) from [<c06fea34>] (devm_clk_get+0x4c/0x80)
[    1.681232] [<c06fea34>] (devm_clk_get) from [<c064253c>] (sysc_probe+0x28c/0xde4)

It's always failing at that point in the code.  Also when
enabling "debug" on the kernel command line, the issue goes
away (exact same binaries etc..):

  https://lava.collabora.co.uk/scheduler/job/1526327

For the record, here's the branch I've been using:

  https://gitlab.collabora.com/gtucker/linux/tree/beaglebone-black-next-20190304-debug

The board otherwise boots fine with next-20190304 (SMP=n), and
also with the patch applied but the shuffle configs set to n.

> Do any of the QEMU machine types [1] approximate this board? I.e. so I
> might be able to independently debug.

Unfortunately there doesn't appear to be any QEMU machine
emulating the TI AM335x SoC or the BeagleBone Black board.

> Were there any boot *successes* on ARM with shuffling enabled? I.e.
> clues about what's different about the specific memory setup for
> beagle-bone-black.

Looking at the KernelCI results from next-20190215, it looks like
only the BeagleBone Black with SMP=n failed to boot:

  https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20190215/

Of course that's not all the ARM boards that exist out there, but
it's a fairly large coverage already.

As the kernel panic always seems to originate in ti-sysc.c,
there's a chance it's only visible on that platform...  I'm doing
a KernelCI run now with my test branch to double check that,
it'll take a few hours so I'll send an update later if I get
anything useful out of it.

In the meantime, I'm happy to try out other things with more
debug configs turned on or any potential fixes someone might
have.

Thanks,
Guillaume

> Thanks for the help!
> 
> [1]: https://wiki.qemu.org/Documentation/Platforms/ARM



  reply	other threads:[~2019-03-06 10:14 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-15 18:20 next/master boot bisection: next-20190215 on beaglebone-black kernelci.org bot
2019-02-15 18:43 ` Andrew Morton
2019-02-15 18:51   ` Mark Brown
2019-02-15 19:00     ` Andrew Morton
2019-02-16  6:21       ` Stephen Rothwell
2019-02-26 23:59     ` Andrew Morton
2019-02-27  0:04       ` Dan Williams
2019-02-27  0:04         ` Dan Williams
2019-02-28 23:14         ` Andrew Morton
2019-02-28 23:55           ` Dan Williams
2019-02-28 23:55             ` Dan Williams
2019-03-01  8:25             ` Guillaume Tucker
2019-03-01  8:25               ` Guillaume Tucker
2019-03-01 10:40               ` Mike Rapoport
2019-03-01 11:49                 ` Mark Brown
2019-03-01 20:41               ` Andrew Morton
2019-03-01 21:04                 ` Guillaume Tucker
2019-03-01 21:04                   ` Guillaume Tucker
2019-03-01 23:23                   ` Dan Williams
2019-03-01 23:23                     ` Dan Williams
2019-03-06 10:14                     ` Guillaume Tucker [this message]
2019-03-06 10:14                       ` Guillaume Tucker
2019-03-06 14:05                       ` Mike Rapoport
2019-03-07  9:16                         ` Guillaume Tucker
2019-03-07  9:16                           ` Guillaume Tucker
2019-03-07 15:43                           ` Dan Williams
2019-03-07 15:43                             ` Dan Williams
2019-04-10 22:52                             ` Kees Cook
2019-04-10 22:52                               ` Kees Cook
2019-04-11 16:42                               ` Guenter Roeck
2019-04-11 16:42                                 ` Guenter Roeck
2019-04-11 16:42                                 ` Guenter Roeck
2019-04-11 17:35                                 ` Kees Cook
2019-04-11 17:35                                   ` Kees Cook
2019-04-11 20:08                                   ` Guenter Roeck
2019-04-11 20:08                                     ` Guenter Roeck
2019-04-11 20:22                                     ` Dan Williams
2019-04-11 20:22                                       ` Dan Williams
2019-04-11 20:53                                       ` Guenter Roeck
2019-04-11 20:53                                         ` Guenter Roeck
2019-04-16 18:54                                         ` Dan Williams
2019-04-16 18:54                                           ` Dan Williams
2019-04-16 19:17                                           ` Mathieu Desnoyers
2019-04-16 19:17                                             ` Mathieu Desnoyers
2019-04-16 19:25                                             ` Mathieu Desnoyers
2019-04-16 19:25                                               ` Mathieu Desnoyers
2019-04-16 19:45                                               ` Mathieu Desnoyers
2019-04-16 19:45                                                 ` Mathieu Desnoyers
2019-04-16 19:33                                           ` Guenter Roeck
2019-04-16 19:33                                             ` Guenter Roeck
2019-04-16 20:37                                             ` Dan Williams
2019-04-16 20:37                                               ` Dan Williams
2019-04-16 21:04                                               ` Guenter Roeck
2019-04-16 21:04                                                 ` Guenter Roeck
2019-04-17  3:30                                                 ` Kees Cook
2019-04-17  3:30                                                   ` Kees Cook
2019-04-16 20:05                                           ` Mathieu Desnoyers
2019-04-16 20:05                                             ` Mathieu Desnoyers
2019-04-11 20:49                                     ` Mike Rapoport
2019-03-01 11:45           ` Mark Brown
2019-03-01  9:02         ` Vlastimil Babka
2019-02-18  9:44 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=36faea07-139c-b97d-3585-f7d6d362abc3@collabora.com \
    --to=guillaume.tucker@collabora.com \
    --cc=adrian@lisas.de \
    --cc=akpm@linux-foundation.org \
    --cc=broonie@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=enric.balletbo@collabora.com \
    --cc=hannes@cmpxchg.org \
    --cc=info@kernelci.org \
    --cc=keescook@chromium.org \
    --cc=khilman@baylibre.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@dominikbrodowski.net \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=matthew.hart@linaro.org \
    --cc=mhocko@suse.com \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=rgb@redhat.com \
    --cc=sfr@canb.auug.org.au \
    --cc=tomeu.vizoso@collabora.com \
    --cc=yamada.masahiro@socionext.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.