All of lore.kernel.org
 help / color / mirror / Atom feed
From: Guillaume Tucker <guillaume.tucker@collabora.com>
To: Mike Rapoport <rppt@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.com>, Mark Brown <broonie@kernel.org>,
	Tomeu Vizoso <tomeu.vizoso@collabora.com>,
	Matt Hart <matthew.hart@linaro.org>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	khilman@baylibre.com, enric.balletbo@collabora.com,
	Nicholas Piggin <npiggin@gmail.com>,
	Dominik Brodowski <linux@dominikbrodowski.net>,
	Masahiro Yamada <yamada.masahiro@socionext.com>,
	Kees Cook <keescook@chromium.org>, Adrian Reber <adrian@lisas.de>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Linux MM <linux-mm@kvack.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Richard Guy Briggs <rgb@redhat.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	info@kernelci.org
Subject: Re: next/master boot bisection: next-20190215 on beaglebone-black
Date: Thu, 7 Mar 2019 09:16:20 +0000	[thread overview]
Message-ID: <21d138a5-13e4-9e83-d7fe-e0639a8d180a@collabora.com> (raw)
In-Reply-To: <20190306140529.GG3549@rapoport-lnx>

On 06/03/2019 14:05, Mike Rapoport wrote:
> On Wed, Mar 06, 2019 at 10:14:47AM +0000, Guillaume Tucker wrote:
>> On 01/03/2019 23:23, Dan Williams wrote:
>>> On Fri, Mar 1, 2019 at 1:05 PM Guillaume Tucker
>>> <guillaume.tucker@collabora.com> wrote:
>>>
>>> Is there an early-printk facility that can be turned on to see how far
>>> we get in the boot?
>>
>> Yes, I've done that now by enabling CONFIG_DEBUG_AM33XXUART1 and
>> earlyprintk in the command line.  Here's the result, with the
>> commit cherry picked on top of next-20190304:
>>
>>   https://lava.collabora.co.uk/scheduler/job/1526326
>>
>> [    1.379522] ti-sysc 4804a000.target-module: sysc_flags 00000222 != 00000022
>> [    1.396718] Unable to handle kernel paging request at virtual address 77bb4003
>> [    1.404203] pgd = (ptrval)
>> [    1.406971] [77bb4003] *pgd=00000000
>> [    1.410650] Internal error: Oops: 5 [#1] ARM
>> [...]
>> [    1.672310] [<c07051a0>] (clk_hw_create_clk.part.21) from [<c06fea34>] (devm_clk_get+0x4c/0x80)
>> [    1.681232] [<c06fea34>] (devm_clk_get) from [<c064253c>] (sysc_probe+0x28c/0xde4)
>>
>> It's always failing at that point in the code.  Also when
>> enabling "debug" on the kernel command line, the issue goes
>> away (exact same binaries etc..):
>>
>>   https://lava.collabora.co.uk/scheduler/job/1526327
>>
>> For the record, here's the branch I've been using:
>>
>>   https://gitlab.collabora.com/gtucker/linux/tree/beaglebone-black-next-20190304-debug
>>
>> The board otherwise boots fine with next-20190304 (SMP=n), and
>> also with the patch applied but the shuffle configs set to n.
>>
>>> Were there any boot *successes* on ARM with shuffling enabled? I.e.
>>> clues about what's different about the specific memory setup for
>>> beagle-bone-black.
>>
>> Looking at the KernelCI results from next-20190215, it looks like
>> only the BeagleBone Black with SMP=n failed to boot:
>>
>>   https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20190215/
>>
>> Of course that's not all the ARM boards that exist out there, but
>> it's a fairly large coverage already.
>>
>> As the kernel panic always seems to originate in ti-sysc.c,
>> there's a chance it's only visible on that platform...  I'm doing
>> a KernelCI run now with my test branch to double check that,
>> it'll take a few hours so I'll send an update later if I get
>> anything useful out of it.

Here's the result, there were a couple of failures but some were
due to infrastructure errors (nyan-big) and I'm not sure about
what was the problem with the meson boards:

  https://staging.kernelci.org/boot/all/job/gtucker/branch/kernelci-local/kernel/next-20190304-1-g4f0b547b03da/

So there's no clear indicator that the shuffle config is causing
any issue on any other platform than the BeagleBone Black.

>> In the meantime, I'm happy to try out other things with more
>> debug configs turned on or any potential fixes someone might
>> have.
> 
> ARM is the only arch that sets ARCH_HAS_HOLES_MEMORYMODEL to 'y'. Maybe the
> failure has something to do with it...
> 
> Guillaume, can you try this patch:

Sure, it doesn't seem to be fixing the problem though:

  https://lava.collabora.co.uk/scheduler/job/1527471

I've added the patch to the same branch based on next-20190304.

I guess this needs to be debugged a little further to see what
the panic really is about.  I'll see if I can spend a bit more
time on it this week, unless there's any BeagleBone expert
available to help or if someone has another fix to try out.

Guillaume

> diff --git a/mm/shuffle.c b/mm/shuffle.c
> index 3ce1248..4a04aac 100644
> --- a/mm/shuffle.c
> +++ b/mm/shuffle.c
> @@ -58,7 +58,8 @@ module_param_call(shuffle, shuffle_store, shuffle_show, &shuffle_param, 0400);
>   * For two pages to be swapped in the shuffle, they must be free (on a
>   * 'free_area' lru), have the same order, and have the same migratetype.
>   */
> -static struct page * __meminit shuffle_valid_page(unsigned long pfn, int order)
> +static struct page * __meminit shuffle_valid_page(unsigned long pfn, int order,
> +						  struct zone *z)
>  {
>  	struct page *page;
>  
> @@ -80,6 +81,9 @@ static struct page * __meminit shuffle_valid_page(unsigned long pfn, int order)
>  	if (!PageBuddy(page))
>  		return NULL;
>  
> +	if (!memmap_valid_within(pfn, page, z))
> +		return NULL;
> +
>  	/*
>  	 * ...is the page on the same list as the page we will
>  	 * shuffle it with?
> @@ -123,7 +127,7 @@ void __meminit __shuffle_zone(struct zone *z)
>  		 * page_j randomly selected in the span @zone_start_pfn to
>  		 * @spanned_pages.
>  		 */
> -		page_i = shuffle_valid_page(i, order);
> +		page_i = shuffle_valid_page(i, order, z);
>  		if (!page_i)
>  			continue;
>  
> @@ -137,7 +141,7 @@ void __meminit __shuffle_zone(struct zone *z)
>  			j = z->zone_start_pfn +
>  				ALIGN_DOWN(get_random_long() % z->spanned_pages,
>  						order_pages);
> -			page_j = shuffle_valid_page(j, order);
> +			page_j = shuffle_valid_page(j, order, z);
>  			if (page_j && page_j != page_i)
>  				break;
>  		}
>  
> 


WARNING: multiple messages have this Message-ID (diff)
From: "Guillaume Tucker" <guillaume.tucker@gmail.com>
To: Mike Rapoport <rppt@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.com>, Mark Brown <broonie@kernel.org>,
	Tomeu Vizoso <tomeu.vizoso@collabora.com>,
	Matt Hart <matthew.hart@linaro.org>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	khilman@baylibre.com, enric.balletbo@collabora.com,
	Nicholas Piggin <npiggin@gmail.com>,
	Dominik Brodowski <linux@dominikbrodowski.net>,
	Masahiro Yamada <yamada.masahiro@socionext.com>,
	Kees Cook <keescook@chromium.org>, Adrian Reber <adrian@lisas.de>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Linux MM <linux-mm@kvack.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Richard Guy Briggs <rgb@redhat.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	info@kernelci.org
Subject: Re: next/master boot bisection: next-20190215 on beaglebone-black
Date: Thu, 7 Mar 2019 09:16:20 +0000	[thread overview]
Message-ID: <21d138a5-13e4-9e83-d7fe-e0639a8d180a@collabora.com> (raw)
In-Reply-To: <20190306140529.GG3549@rapoport-lnx>

On 06/03/2019 14:05, Mike Rapoport wrote:
> On Wed, Mar 06, 2019 at 10:14:47AM +0000, Guillaume Tucker wrote:
>> On 01/03/2019 23:23, Dan Williams wrote:
>>> On Fri, Mar 1, 2019 at 1:05 PM Guillaume Tucker
>>> <guillaume.tucker@collabora.com> wrote:
>>>
>>> Is there an early-printk facility that can be turned on to see how far
>>> we get in the boot?
>>
>> Yes, I've done that now by enabling CONFIG_DEBUG_AM33XXUART1 and
>> earlyprintk in the command line.  Here's the result, with the
>> commit cherry picked on top of next-20190304:
>>
>>   https://lava.collabora.co.uk/scheduler/job/1526326
>>
>> [    1.379522] ti-sysc 4804a000.target-module: sysc_flags 00000222 != 00000022
>> [    1.396718] Unable to handle kernel paging request at virtual address 77bb4003
>> [    1.404203] pgd = (ptrval)
>> [    1.406971] [77bb4003] *pgd=00000000
>> [    1.410650] Internal error: Oops: 5 [#1] ARM
>> [...]
>> [    1.672310] [<c07051a0>] (clk_hw_create_clk.part.21) from [<c06fea34>] (devm_clk_get+0x4c/0x80)
>> [    1.681232] [<c06fea34>] (devm_clk_get) from [<c064253c>] (sysc_probe+0x28c/0xde4)
>>
>> It's always failing at that point in the code.  Also when
>> enabling "debug" on the kernel command line, the issue goes
>> away (exact same binaries etc..):
>>
>>   https://lava.collabora.co.uk/scheduler/job/1526327
>>
>> For the record, here's the branch I've been using:
>>
>>   https://gitlab.collabora.com/gtucker/linux/tree/beaglebone-black-next-20190304-debug
>>
>> The board otherwise boots fine with next-20190304 (SMP=n), and
>> also with the patch applied but the shuffle configs set to n.
>>
>>> Were there any boot *successes* on ARM with shuffling enabled? I.e.
>>> clues about what's different about the specific memory setup for
>>> beagle-bone-black.
>>
>> Looking at the KernelCI results from next-20190215, it looks like
>> only the BeagleBone Black with SMP=n failed to boot:
>>
>>   https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20190215/
>>
>> Of course that's not all the ARM boards that exist out there, but
>> it's a fairly large coverage already.
>>
>> As the kernel panic always seems to originate in ti-sysc.c,
>> there's a chance it's only visible on that platform...  I'm doing
>> a KernelCI run now with my test branch to double check that,
>> it'll take a few hours so I'll send an update later if I get
>> anything useful out of it.

Here's the result, there were a couple of failures but some were
due to infrastructure errors (nyan-big) and I'm not sure about
what was the problem with the meson boards:

  https://staging.kernelci.org/boot/all/job/gtucker/branch/kernelci-local/kernel/next-20190304-1-g4f0b547b03da/

So there's no clear indicator that the shuffle config is causing
any issue on any other platform than the BeagleBone Black.

>> In the meantime, I'm happy to try out other things with more
>> debug configs turned on or any potential fixes someone might
>> have.
> 
> ARM is the only arch that sets ARCH_HAS_HOLES_MEMORYMODEL to 'y'. Maybe the
> failure has something to do with it...
> 
> Guillaume, can you try this patch:

Sure, it doesn't seem to be fixing the problem though:

  https://lava.collabora.co.uk/scheduler/job/1527471

I've added the patch to the same branch based on next-20190304.

I guess this needs to be debugged a little further to see what
the panic really is about.  I'll see if I can spend a bit more
time on it this week, unless there's any BeagleBone expert
available to help or if someone has another fix to try out.

Guillaume

> diff --git a/mm/shuffle.c b/mm/shuffle.c
> index 3ce1248..4a04aac 100644
> --- a/mm/shuffle.c
> +++ b/mm/shuffle.c
> @@ -58,7 +58,8 @@ module_param_call(shuffle, shuffle_store, shuffle_show, &shuffle_param, 0400);
>   * For two pages to be swapped in the shuffle, they must be free (on a
>   * 'free_area' lru), have the same order, and have the same migratetype.
>   */
> -static struct page * __meminit shuffle_valid_page(unsigned long pfn, int order)
> +static struct page * __meminit shuffle_valid_page(unsigned long pfn, int order,
> +						  struct zone *z)
>  {
>  	struct page *page;
>  
> @@ -80,6 +81,9 @@ static struct page * __meminit shuffle_valid_page(unsigned long pfn, int order)
>  	if (!PageBuddy(page))
>  		return NULL;
>  
> +	if (!memmap_valid_within(pfn, page, z))
> +		return NULL;
> +
>  	/*
>  	 * ...is the page on the same list as the page we will
>  	 * shuffle it with?
> @@ -123,7 +127,7 @@ void __meminit __shuffle_zone(struct zone *z)
>  		 * page_j randomly selected in the span @zone_start_pfn to
>  		 * @spanned_pages.
>  		 */
> -		page_i = shuffle_valid_page(i, order);
> +		page_i = shuffle_valid_page(i, order, z);
>  		if (!page_i)
>  			continue;
>  
> @@ -137,7 +141,7 @@ void __meminit __shuffle_zone(struct zone *z)
>  			j = z->zone_start_pfn +
>  				ALIGN_DOWN(get_random_long() % z->spanned_pages,
>  						order_pages);
> -			page_j = shuffle_valid_page(j, order);
> +			page_j = shuffle_valid_page(j, order, z);
>  			if (page_j && page_j != page_i)
>  				break;
>  		}
>  
> 


  reply	other threads:[~2019-03-07  9:16 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-15 18:20 next/master boot bisection: next-20190215 on beaglebone-black kernelci.org bot
2019-02-15 18:43 ` Andrew Morton
2019-02-15 18:51   ` Mark Brown
2019-02-15 19:00     ` Andrew Morton
2019-02-16  6:21       ` Stephen Rothwell
2019-02-26 23:59     ` Andrew Morton
2019-02-27  0:04       ` Dan Williams
2019-02-27  0:04         ` Dan Williams
2019-02-28 23:14         ` Andrew Morton
2019-02-28 23:55           ` Dan Williams
2019-02-28 23:55             ` Dan Williams
2019-03-01  8:25             ` Guillaume Tucker
2019-03-01  8:25               ` Guillaume Tucker
2019-03-01 10:40               ` Mike Rapoport
2019-03-01 11:49                 ` Mark Brown
2019-03-01 20:41               ` Andrew Morton
2019-03-01 21:04                 ` Guillaume Tucker
2019-03-01 21:04                   ` Guillaume Tucker
2019-03-01 23:23                   ` Dan Williams
2019-03-01 23:23                     ` Dan Williams
2019-03-06 10:14                     ` Guillaume Tucker
2019-03-06 10:14                       ` Guillaume Tucker
2019-03-06 14:05                       ` Mike Rapoport
2019-03-07  9:16                         ` Guillaume Tucker [this message]
2019-03-07  9:16                           ` Guillaume Tucker
2019-03-07 15:43                           ` Dan Williams
2019-03-07 15:43                             ` Dan Williams
2019-04-10 22:52                             ` Kees Cook
2019-04-10 22:52                               ` Kees Cook
2019-04-11 16:42                               ` Guenter Roeck
2019-04-11 16:42                                 ` Guenter Roeck
2019-04-11 16:42                                 ` Guenter Roeck
2019-04-11 17:35                                 ` Kees Cook
2019-04-11 17:35                                   ` Kees Cook
2019-04-11 20:08                                   ` Guenter Roeck
2019-04-11 20:08                                     ` Guenter Roeck
2019-04-11 20:22                                     ` Dan Williams
2019-04-11 20:22                                       ` Dan Williams
2019-04-11 20:53                                       ` Guenter Roeck
2019-04-11 20:53                                         ` Guenter Roeck
2019-04-16 18:54                                         ` Dan Williams
2019-04-16 18:54                                           ` Dan Williams
2019-04-16 19:17                                           ` Mathieu Desnoyers
2019-04-16 19:17                                             ` Mathieu Desnoyers
2019-04-16 19:25                                             ` Mathieu Desnoyers
2019-04-16 19:25                                               ` Mathieu Desnoyers
2019-04-16 19:45                                               ` Mathieu Desnoyers
2019-04-16 19:45                                                 ` Mathieu Desnoyers
2019-04-16 19:33                                           ` Guenter Roeck
2019-04-16 19:33                                             ` Guenter Roeck
2019-04-16 20:37                                             ` Dan Williams
2019-04-16 20:37                                               ` Dan Williams
2019-04-16 21:04                                               ` Guenter Roeck
2019-04-16 21:04                                                 ` Guenter Roeck
2019-04-17  3:30                                                 ` Kees Cook
2019-04-17  3:30                                                   ` Kees Cook
2019-04-16 20:05                                           ` Mathieu Desnoyers
2019-04-16 20:05                                             ` Mathieu Desnoyers
2019-04-11 20:49                                     ` Mike Rapoport
2019-03-01 11:45           ` Mark Brown
2019-03-01  9:02         ` Vlastimil Babka
2019-02-18  9:44 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=21d138a5-13e4-9e83-d7fe-e0639a8d180a@collabora.com \
    --to=guillaume.tucker@collabora.com \
    --cc=adrian@lisas.de \
    --cc=akpm@linux-foundation.org \
    --cc=broonie@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=enric.balletbo@collabora.com \
    --cc=hannes@cmpxchg.org \
    --cc=info@kernelci.org \
    --cc=keescook@chromium.org \
    --cc=khilman@baylibre.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@dominikbrodowski.net \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=matthew.hart@linaro.org \
    --cc=mhocko@suse.com \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=rgb@redhat.com \
    --cc=rppt@linux.ibm.com \
    --cc=sfr@canb.auug.org.au \
    --cc=tomeu.vizoso@collabora.com \
    --cc=yamada.masahiro@socionext.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.