linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: kernelci/staging-next bisection: sleep.login on rk3288-rock2-square #2286-staging
       [not found] <5fd3e5d9.1c69fb81.f9e69.5028@mx.google.com>
@ 2020-12-11 21:53 ` Guillaume Tucker
  2020-12-13  8:23   ` Mike Rapoport
  0 siblings, 1 reply; 9+ messages in thread
From: Guillaume Tucker @ 2020-12-11 21:53 UTC (permalink / raw)
  To: Andrea Arcangeli, Mike Rapoport, Andrew Morton, Stephen Rothwell,
	kernelci-results-staging, kernelci-results
  Cc: linux-mm, linux-kernel, Mike Rapoport, Baoquan He

Hi Mike,

Please see the bisection report below about a boot failure on
rk3288 with next-20201210.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

There's nothing in the serial console log, probably because it's
crashing too early during boot.  This was confirmed on two rk3288
platforms on kernelci.org: rk3288-veyron-jaq and
rk3288-rock2-square.  There's no clear sign about other platforms
being impacted.

If this looks like something you want to investigate but you
don't have a platform at hand to reproduce it, please let us know
if you would like the test to be re-run on kernelci.org with some
debug config turned on, or if you have a fix to try.

Thanks,
Guillaume

On 11/12/2020 21:34, staging.kernelci.org bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has      *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.      *
> *                                                               *
> * If you do send a fix, please include this trailer:            *
> *   Reported-by: "kernelci.org bot" <bot@kernelci.org>          *
> *                                                               *
> * Hope this helps!                                              *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> kernelci/staging-next bisection: sleep.login on rk3288-rock2-square #2286-staging
> 
> Summary:
>   Start:      7f507faf2d85 staging-next-20201211.0

This is really next-20201210...  The revision shown here is just
an artifact of staging.kernelci.org which creates its own tags.

>   Plain log:  https://storage.staging.kernelci.org/kernelci/staging-next/staging-next-20201211.0/arm/multi_v7_defconfig/gcc-8/lab-collabora/sleep-rk3288-rock2-square.txt
>   HTML log:   https://storage.staging.kernelci.org/kernelci/staging-next/staging-next-20201211.0/arm/multi_v7_defconfig/gcc-8/lab-collabora/sleep-rk3288-rock2-square.html
>   Result:     950c37691925 mm: memblock: enforce overlap of memory.memblock and memory.reserved
> 
> Checks:
>   revert:     PASS
>   verify:     PASS
> 
> Parameters:
>   Tree:       kernelci
>   URL:        https://github.com/kernelci/linux.git
>   Branch:     staging-next
>   Target:     rk3288-rock2-square
>   CPU arch:   arm
>   Lab:        lab-collabora
>   Compiler:   gcc-8
>   Config:     multi_v7_defconfig
>   Test case:  sleep.login
> 
> Breaking commit found:
> 
> -------------------------------------------------------------------------------
> commit 950c3769192512118a87432dd42e71c5241dbd10
> Author: Mike Rapoport <rppt@linux.ibm.com>
> Date:   Thu Dec 10 15:40:51 2020 +1100
> 
>     mm: memblock: enforce overlap of memory.memblock and memory.reserved
>     
>     Patch series "mm: fix initialization of struct page for holes in  memory layout", v2.
>     
>     Commit 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions
>     rather that check each PFN") exposed several issues with the memory map
>     initialization and these patches fix those issues.
>     
>     Initially there were crashes during compaction that Qian Cai reported back
>     in April [1].  It seemed back then that the probelm was fixed, but a few
>     weeks ago Andrea Arcangeli hit the same bug [2] and after a long
>     discussion between us [3] I think these patches are the proper fix.
>     
>     [1] https://lore.kernel.org/lkml/8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw
>     [2] https://lore.kernel.org/lkml/20201121194506.13464-1-aarcange@redhat.com
>     [3] https://lore.kernel.org/mm-commits/20201206005401.qKuAVgOXr%akpm@linux-foundation.org
>     
>     This patch (of 2):
>     
>     memblock does not require that the reserved memory ranges will be a subset
>     of memblock.memory.
>     
>     As a result there may be reserved pages that are not in the range of any
>     zone or node because zone and node boundaries are detected based on
>     memblock.memory and pages that only present in memblock.reserved are not
>     taken into account during zone/node size detection.
>     
>     Make sure that all ranges in memblock.reserved are added to
>     memblock.memory before calculating node and zone boundaries.
>     
>     Link: https://lkml.kernel.org/r/20201209214304.6812-1-rppt@kernel.org
>     Link: https://lkml.kernel.org/r/20201209214304.6812-2-rppt@kernel.org
>     Fixes: 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
>     Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
>     Reported-by: Andrea Arcangeli <aarcange@redhat.com>
>     Cc: Baoquan He <bhe@redhat.com>
>     Cc: David Hildenbrand <david@redhat.com>
>     Cc: Mel Gorman <mgorman@suse.de>
>     Cc: Michal Hocko <mhocko@kernel.org>
>     Cc: Qian Cai <cai@lca.pw>
>     Cc: Vlastimil Babka <vbabka@suse.cz>
>     Cc: <stable@vger.kernel.org>
>     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>     Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
> 
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index ef131255cedc..e64dae2dd1ce 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -120,6 +120,7 @@ int memblock_clear_nomap(phys_addr_t base, phys_addr_t size);
>  unsigned long memblock_free_all(void);
>  void reset_node_managed_pages(pg_data_t *pgdat);
>  void reset_all_zones_managed_pages(void);
> +void memblock_enforce_memory_reserved_overlap(void);
>  
>  /* Low level functions */
>  void __next_mem_range(u64 *idx, int nid, enum memblock_flags flags,
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 049df4163a97..18432bc166f6 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1857,6 +1857,30 @@ void __init_memblock memblock_trim_memory(phys_addr_t align)
>  	}
>  }
>  
> +/**
> + * memblock_enforce_memory_reserved_overlap - make sure every range in
> + * @memblock.reserved is covered by @memblock.memory
> + *
> + * The data in @memblock.memory is used to detect zone and node boundaries
> + * during initialization of the memory map and the page allocator. Make
> + * sure that every memory range present in @memblock.reserved is also added
> + * to @memblock.memory even if the architecture specific memory
> + * initialization failed to do so
> + */
> +void __init memblock_enforce_memory_reserved_overlap(void)
> +{
> +	phys_addr_t start, end;
> +	int nid;
> +	u64 i;
> +
> +	__for_each_mem_range(i, &memblock.reserved, &memblock.memory,
> +			     NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, &nid) {
> +		pr_warn("memblock: reserved range [%pa-%pa] is not in memory\n",
> +			&start, &end);
> +		memblock_add_node(start, (end - start), nid);
> +	}
> +}
> +
>  void __init_memblock memblock_set_current_limit(phys_addr_t limit)
>  {
>  	memblock.current_limit = limit;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7c8ead3da355..f117460d6223 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7507,6 +7507,13 @@ void __init free_area_init(unsigned long *max_zone_pfn)
>  	memset(arch_zone_highest_possible_pfn, 0,
>  				sizeof(arch_zone_highest_possible_pfn));
>  
> +	/*
> +	 * Some architectures (e.g. x86) have reserved pages outside of
> +	 * memblock.memory. Make sure these pages are taken into account
> +	 * when detecting zone and node boundaries
> +	 */
> +	memblock_enforce_memory_reserved_overlap();
> +
>  	start_pfn = find_min_pfn_with_active_regions();
>  	descending = arch_has_descending_max_zone_pfns();
> -------------------------------------------------------------------------------
> 
> 
> Git bisection log:
> 
> -------------------------------------------------------------------------------
> git bisect start
> # good: [69fe24d1d80feac4289778582cf0a15256d59baf] firmware: xilinx: Mark pm_api_features_map with static keyword
> git bisect good 69fe24d1d80feac4289778582cf0a15256d59baf
> # bad: [7f507faf2d8592f0f4455728dd08986ec6cc7b0e] staging-next-20201211.0
> git bisect bad 7f507faf2d8592f0f4455728dd08986ec6cc7b0e
> # good: [4baeae4883ba51406bb4f06c886d61440628adb7] Merge remote-tracking branch 'crypto/master'
> git bisect good 4baeae4883ba51406bb4f06c886d61440628adb7
> # good: [593b02d9998c2ae111b2afd9205b5be094b1a69e] Merge remote-tracking branch 'spi/for-next'
> git bisect good 593b02d9998c2ae111b2afd9205b5be094b1a69e
> # good: [69f315daea3d7943175d7570576fd21bef3965c2] Merge remote-tracking branch 'staging/staging-next'
> git bisect good 69f315daea3d7943175d7570576fd21bef3965c2
> # good: [fce046ce7d0944b02fcd190b26d995ab2dd3c5fd] Merge remote-tracking branch 'userns/for-next'
> git bisect good fce046ce7d0944b02fcd190b26d995ab2dd3c5fd
> # bad: [df3f2557282cba0311b47d886032650cf45e449f] rapidio: remove unused rio_get_asm() and rio_get_device()
> git bisect bad df3f2557282cba0311b47d886032650cf45e449f
> # good: [176232b371b0ab0e970e80879f851fe529be8ef0] mm/page_alloc: clear all pages in post_alloc_hook() with init_on_alloc=1
> git bisect good 176232b371b0ab0e970e80879f851fe529be8ef0
> # bad: [e1a24938fc628aa51933262a0a4af3bd3085e4df] zram: break the strict dependency from lzo
> git bisect bad e1a24938fc628aa51933262a0a4af3bd3085e4df
> # bad: [23b1d94b7bd7db1903686c4f2364b942181db887] mm: make pagecache tagged lookups return only head pages
> git bisect bad 23b1d94b7bd7db1903686c4f2364b942181db887
> # good: [d9f9370b97e3b7b84e92870d12fa17b9a346bc44] mm/vmscan.c: remove the filename in the top of file comment
> git bisect good d9f9370b97e3b7b84e92870d12fa17b9a346bc44
> # bad: [0c675604b0b47efb3281ffa66ede56036bb674b7] mm-fix-initialization-of-struct-page-for-holes-in-memory-layout-checkpatch-fixes
> git bisect bad 0c675604b0b47efb3281ffa66ede56036bb674b7
> # good: [d6c1578855ee5805c673638648e5ded4364a2649] z3fold: remove preempt disabled sections for RT
> git bisect good d6c1578855ee5805c673638648e5ded4364a2649
> # good: [d9387865b7499bbd03e905c7170efba840ba6505] mm/compaction: make defer_compaction and compaction_deferred static
> git bisect good d9387865b7499bbd03e905c7170efba840ba6505
> # bad: [bdc54c457d8b2b5883b7223c52b5b451538a70a3] mm: fix initialization of struct page for holes in memory layout
> git bisect bad bdc54c457d8b2b5883b7223c52b5b451538a70a3
> # bad: [950c3769192512118a87432dd42e71c5241dbd10] mm: memblock: enforce overlap of memory.memblock and memory.reserved
> git bisect bad 950c3769192512118a87432dd42e71c5241dbd10
> # first bad commit: [950c3769192512118a87432dd42e71c5241dbd10] mm: memblock: enforce overlap of memory.memblock and memory.reserved
> -------------------------------------------------------------------------------
> 
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Groups.io Links: You receive all messages sent to this group.
> View/Reply Online (#3027): https://groups.io/g/kernelci-results-staging/message/3027
> Mute This Topic: https://groups.io/mt/78889638/924702
> Mute #2286-staging:https://groups.io/g/kernelci-results-staging/mutehashtag/2286-staging
> Group Owner: kernelci-results-staging+owner@groups.io
> Unsubscribe: https://groups.io/g/kernelci-results-staging/leave/8133414/1062240773/xyzzy [guillaume.tucker@collabora.com]
> -=-=-=-=-=-=-=-=-=-=-=-
> 
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernelci/staging-next bisection: sleep.login on rk3288-rock2-square #2286-staging
  2020-12-11 21:53 ` kernelci/staging-next bisection: sleep.login on rk3288-rock2-square #2286-staging Guillaume Tucker
@ 2020-12-13  8:23   ` Mike Rapoport
  2020-12-18 21:59     ` Guillaume Tucker
  0 siblings, 1 reply; 9+ messages in thread
From: Mike Rapoport @ 2020-12-13  8:23 UTC (permalink / raw)
  To: Guillaume Tucker
  Cc: Andrea Arcangeli, Andrew Morton, Stephen Rothwell,
	kernelci-results-staging, kernelci-results, linux-mm,
	linux-kernel, Mike Rapoport, Baoquan He

Hi Guillaume,

On Fri, Dec 11, 2020 at 09:53:46PM +0000, Guillaume Tucker wrote:
> Hi Mike,
> 
> Please see the bisection report below about a boot failure on
> rk3288 with next-20201210.
> 
> Reports aren't automatically sent to the public while we're
> trialing new bisection features on kernelci.org but this one
> looks valid.
> 
> There's nothing in the serial console log, probably because it's
> crashing too early during boot.  This was confirmed on two rk3288
> platforms on kernelci.org: rk3288-veyron-jaq and
> rk3288-rock2-square.  There's no clear sign about other platforms
> being impacted.
> 
> If this looks like something you want to investigate but you
> don't have a platform at hand to reproduce it, please let us know
> if you would like the test to be re-run on kernelci.org with some
> debug config turned on, or if you have a fix to try.

I'd apprciate if you can build a working kernel with
CONFIG_DEBUG_MEMORY_INIT=y and run it with 

	memblock=debug mminit_loglevel=4

in the command line.

If I understand correctly, DEBUG_LL is not an option for these platforms
so if earlyprintk didn't display the log there is not much to do about
it.

> Thanks,
> Guillaume

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernelci/staging-next bisection: sleep.login on rk3288-rock2-square #2286-staging
  2020-12-13  8:23   ` Mike Rapoport
@ 2020-12-18 21:59     ` Guillaume Tucker
  2021-01-03 13:47       ` Mike Rapoport
  0 siblings, 1 reply; 9+ messages in thread
From: Guillaume Tucker @ 2020-12-18 21:59 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrea Arcangeli, Andrew Morton, Stephen Rothwell,
	kernelci-results-staging, kernelci-results, linux-mm,
	linux-kernel, Mike Rapoport, Baoquan He

On 13/12/2020 08:23, Mike Rapoport wrote:
> Hi Guillaume,
> 
> On Fri, Dec 11, 2020 at 09:53:46PM +0000, Guillaume Tucker wrote:
>> Hi Mike,
>>
>> Please see the bisection report below about a boot failure on
>> rk3288 with next-20201210.
>>
>> Reports aren't automatically sent to the public while we're
>> trialing new bisection features on kernelci.org but this one
>> looks valid.
>>
>> There's nothing in the serial console log, probably because it's
>> crashing too early during boot.  This was confirmed on two rk3288
>> platforms on kernelci.org: rk3288-veyron-jaq and
>> rk3288-rock2-square.  There's no clear sign about other platforms
>> being impacted.
>>
>> If this looks like something you want to investigate but you
>> don't have a platform at hand to reproduce it, please let us know
>> if you would like the test to be re-run on kernelci.org with some
>> debug config turned on, or if you have a fix to try.
> 
> I'd apprciate if you can build a working kernel with
> CONFIG_DEBUG_MEMORY_INIT=y and run it with 
> 
> 	memblock=debug mminit_loglevel=4
> 
> in the command line.
> 
> If I understand correctly, DEBUG_LL is not an option for these platforms
> so if earlyprintk didn't display the log there is not much to do about
> it.

OK, sorry for the delay.  I've built a kernel and booted it as
you requested, and also found that the issue was due to this
memory area defined in arch/arm/boot/dts/rk3288.dtsi:

        reserved-memory {
                #address-cells = <2>;
                #size-cells = <2>;
                ranges;

                /*
                 * The rk3288 cannot use the memory area above 0xfe000000
                 * for dma operations for some reason. While there is
                 * probably a better solution available somewhere, we
                 * haven't found it yet and while devices with 2GB of ram
                 * are not affected, this issue prevents 4GB from booting.
                 * So to make these devices at least bootable, block
                 * this area for the time being until the real solution
                 * is found.
                 */
                dma-unusable@fe000000 {
                        reg = <0x0 0xfe000000 0x0 0x1000000>;
                };
        };

So I've put a hack[1] on top of 950c37691925 to skip adding a
node in memblock_enforce_memory_reserved_overlap() if the base
address is 0xfe000000, which got the kernel booting.  Here's the
console log:

  https://people.collabora.com/~gtucker/tmp/2966825.txt

and the full test job details, if this helps:

  https://lava.collabora.co.uk/scheduler/job/2966825


I haven't really looked much further than that, but I'll be
available on Monday to help run other tests if needed.

Thanks,
Guillaume

[1] https://people.collabora.com/~gtucker/tmp/2966825.patch


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernelci/staging-next bisection: sleep.login on rk3288-rock2-square #2286-staging
  2020-12-18 21:59     ` Guillaume Tucker
@ 2021-01-03 13:47       ` Mike Rapoport
  2021-01-03 20:09         ` Andrea Arcangeli
  0 siblings, 1 reply; 9+ messages in thread
From: Mike Rapoport @ 2021-01-03 13:47 UTC (permalink / raw)
  To: Guillaume Tucker
  Cc: Andrea Arcangeli, Andrew Morton, Stephen Rothwell,
	kernelci-results-staging, kernelci-results, linux-mm,
	linux-kernel, Mike Rapoport, Baoquan He

On Fri, Dec 18, 2020 at 09:59:26PM +0000, Guillaume Tucker wrote:
> On 13/12/2020 08:23, Mike Rapoport wrote:
> > Hi Guillaume,
> > 
> > On Fri, Dec 11, 2020 at 09:53:46PM +0000, Guillaume Tucker wrote:
> >> Hi Mike,
> >>
> 
> OK, sorry for the delay.  I've built a kernel and booted it as
> you requested, and also found that the issue was due to this
> memory area defined in arch/arm/boot/dts/rk3288.dtsi:
> 
>         reserved-memory {
>                 #address-cells = <2>;
>                 #size-cells = <2>;
>                 ranges;
> 
>                 /*
>                  * The rk3288 cannot use the memory area above 0xfe000000
>                  * for dma operations for some reason. While there is
>                  * probably a better solution available somewhere, we
>                  * haven't found it yet and while devices with 2GB of ram
>                  * are not affected, this issue prevents 4GB from booting.
>                  * So to make these devices at least bootable, block
>                  * this area for the time being until the real solution
>                  * is found.
>                  */
>                 dma-unusable@fe000000 {
>                         reg = <0x0 0xfe000000 0x0 0x1000000>;
>                 };
>         };
> 
> So I've put a hack[1] on top of 950c37691925 to skip adding a
> node in memblock_enforce_memory_reserved_overlap() if the base
> address is 0xfe000000, which got the kernel booting.  Here's the
> console log:
> 
>   https://people.collabora.com/~gtucker/tmp/2966825.txt
> 
> and the full test job details, if this helps:
> 
>   https://lava.collabora.co.uk/scheduler/job/2966825
> 
> 
> I haven't really looked much further than that, but I'll be
> available on Monday to help run other tests if needed.

Sorry for the delay, I was mostly offline for the last three weeks.

Thanks for the logs, it seems that implicitly adding reserved regions to
memblock.memory wasn't that bright idea :)
 
> Thanks,
> Guillaume
> 
> [1] https://people.collabora.com/~gtucker/tmp/2966825.patch

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernelci/staging-next bisection: sleep.login on rk3288-rock2-square #2286-staging
  2021-01-03 13:47       ` Mike Rapoport
@ 2021-01-03 20:09         ` Andrea Arcangeli
  2021-01-05  9:13           ` Mike Rapoport
  0 siblings, 1 reply; 9+ messages in thread
From: Andrea Arcangeli @ 2021-01-03 20:09 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Guillaume Tucker, Andrew Morton, Stephen Rothwell,
	kernelci-results-staging, kernelci-results, linux-mm,
	linux-kernel, Mike Rapoport, Baoquan He

Hello Mike,

On Sun, Jan 03, 2021 at 03:47:53PM +0200, Mike Rapoport wrote:
> Thanks for the logs, it seems that implicitly adding reserved regions to
> memblock.memory wasn't that bright idea :)

Would it be possible to somehow clean up the hack then?

The only difference between the clean solution and the hack is that
the hack intended to achieved the exact same, but without adding the
reserved regions to memblock.memory.

The comment on that problematic area says the reserved area cannot be
used for DMA because of some unexplained hw issue, and that doing so
prevents booting, but since the area got reserved, even with the clean
solution, it shouldn't have never been used for DMA?

So I can only imagine that the physical memory region is way more
problematic than just for DMA. It sounds like that anything that
touches it, including the CPU, will hang the system, not just DMA. It
sounds somewhat similar to the other e820 direct mapping issue on x86?

If you want to test the hack on the arm board to check if it boots you
can use the below commit:

https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?id=c3ea2633015104ce0df33dcddbc36f57de1392bc

Thanks,
Andrea



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernelci/staging-next bisection: sleep.login on rk3288-rock2-square #2286-staging
  2021-01-03 20:09         ` Andrea Arcangeli
@ 2021-01-05  9:13           ` Mike Rapoport
  2021-01-12 10:53             ` Guillaume Tucker
  0 siblings, 1 reply; 9+ messages in thread
From: Mike Rapoport @ 2021-01-05  9:13 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Guillaume Tucker, Andrew Morton, Stephen Rothwell,
	kernelci-results-staging, kernelci-results, linux-mm,
	linux-kernel, Mike Rapoport, Baoquan He

On Sun, Jan 03, 2021 at 03:09:14PM -0500, Andrea Arcangeli wrote:
> Hello Mike,
> 
> On Sun, Jan 03, 2021 at 03:47:53PM +0200, Mike Rapoport wrote:
> > Thanks for the logs, it seems that implicitly adding reserved regions to
> > memblock.memory wasn't that bright idea :)
> 
> Would it be possible to somehow clean up the hack then?
> 
> The only difference between the clean solution and the hack is that
> the hack intended to achieved the exact same, but without adding the
> reserved regions to memblock.memory.

I didn't consider adding reserved regions to memblock.memory as a clean
solution, this was still a hack, but I didn't think that things are that
fragile.

I still think we cannot rely on memblock.reserved to detect
memory/zone/node sizes and the boot failure reported here confirms this.
 
> The comment on that problematic area says the reserved area cannot be
> used for DMA because of some unexplained hw issue, and that doing so
> prevents booting, but since the area got reserved, even with the clean
> solution, it shouldn't have never been used for DMA?
>
> So I can only imagine that the physical memory region is way more
> problematic than just for DMA. It sounds like that anything that
> touches it, including the CPU, will hang the system, not just DMA. It
> sounds somewhat similar to the other e820 direct mapping issue on x86?

My understanding is that the boot failed because when I implicitly added
the reserved region to memblock.memory the memory size seen by
free_area_init() jumped from 2G to 4G because the reserved area was close
to 4G. The very first allocation would get a chunk from slightly below of
4G and as there is no real memory there, the kernel would crash.
 
> If you want to test the hack on the arm board to check if it boots you
> can use the below commit:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?id=c3ea2633015104ce0df33dcddbc36f57de1392bc

My take is your solution would boot with this memory configuration, but I
still don't think that using memblock.reserved for zone/node sizing is
correct.

> Thanks,
> Andrea
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernelci/staging-next bisection: sleep.login on rk3288-rock2-square #2286-staging
  2021-01-05  9:13           ` Mike Rapoport
@ 2021-01-12 10:53             ` Guillaume Tucker
  2021-01-12 11:10               ` Guillaume Tucker
  2021-01-12 12:06               ` Mike Rapoport
  0 siblings, 2 replies; 9+ messages in thread
From: Guillaume Tucker @ 2021-01-12 10:53 UTC (permalink / raw)
  To: Mike Rapoport, Andrea Arcangeli
  Cc: Andrew Morton, Stephen Rothwell, kernelci-results-staging,
	kernelci-results, linux-mm, linux-kernel, Mike Rapoport,
	Baoquan He

On 05/01/2021 09:13, Mike Rapoport wrote:
> On Sun, Jan 03, 2021 at 03:09:14PM -0500, Andrea Arcangeli wrote:
>> Hello Mike,
>>
>> On Sun, Jan 03, 2021 at 03:47:53PM +0200, Mike Rapoport wrote:
>>> Thanks for the logs, it seems that implicitly adding reserved regions to
>>> memblock.memory wasn't that bright idea :)
>>
>> Would it be possible to somehow clean up the hack then?
>>
>> The only difference between the clean solution and the hack is that
>> the hack intended to achieved the exact same, but without adding the
>> reserved regions to memblock.memory.
> 
> I didn't consider adding reserved regions to memblock.memory as a clean
> solution, this was still a hack, but I didn't think that things are that
> fragile.
> 
> I still think we cannot rely on memblock.reserved to detect
> memory/zone/node sizes and the boot failure reported here confirms this.
>  
>> The comment on that problematic area says the reserved area cannot be
>> used for DMA because of some unexplained hw issue, and that doing so
>> prevents booting, but since the area got reserved, even with the clean
>> solution, it shouldn't have never been used for DMA?
>>
>> So I can only imagine that the physical memory region is way more
>> problematic than just for DMA. It sounds like that anything that
>> touches it, including the CPU, will hang the system, not just DMA. It
>> sounds somewhat similar to the other e820 direct mapping issue on x86?
> 
> My understanding is that the boot failed because when I implicitly added
> the reserved region to memblock.memory the memory size seen by
> free_area_init() jumped from 2G to 4G because the reserved area was close
> to 4G. The very first allocation would get a chunk from slightly below of
> 4G and as there is no real memory there, the kernel would crash.
>  
>> If you want to test the hack on the arm board to check if it boots you
>> can use the below commit:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?id=c3ea2633015104ce0df33dcddbc36f57de1392bc
> 
> My take is your solution would boot with this memory configuration, but I
> still don't think that using memblock.reserved for zone/node sizing is
> correct.

The rk3288 platform has now been failing to boot for nearly a
month on linux-next:

  https://kernelci.org/test/case/id/5ffbed0a31ad81239bc94cdb/

Until a fix or a new version of this patch is made, would it be
possible to drop it or revert it so the platform become usable
again?

Or if you want, I can make a cleaned-up version of my hack to
ignore the problematic region if you still need your patch to be
on linux-next, but that would probably be less than ideal.

Thanks,
Guillaume


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernelci/staging-next bisection: sleep.login on rk3288-rock2-square #2286-staging
  2021-01-12 10:53             ` Guillaume Tucker
@ 2021-01-12 11:10               ` Guillaume Tucker
  2021-01-12 12:06               ` Mike Rapoport
  1 sibling, 0 replies; 9+ messages in thread
From: Guillaume Tucker @ 2021-01-12 11:10 UTC (permalink / raw)
  To: Mike Rapoport, Andrea Arcangeli
  Cc: Andrew Morton, Stephen Rothwell, kernelci-results-staging,
	kernelci-results, linux-mm, linux-kernel, Mike Rapoport,
	Baoquan He

On 12/01/2021 10:53, Guillaume Tucker wrote:
> On 05/01/2021 09:13, Mike Rapoport wrote:
>> On Sun, Jan 03, 2021 at 03:09:14PM -0500, Andrea Arcangeli wrote:
>>> Hello Mike,
>>>
>>> On Sun, Jan 03, 2021 at 03:47:53PM +0200, Mike Rapoport wrote:
>>>> Thanks for the logs, it seems that implicitly adding reserved regions to
>>>> memblock.memory wasn't that bright idea :)
>>>
>>> Would it be possible to somehow clean up the hack then?
>>>
>>> The only difference between the clean solution and the hack is that
>>> the hack intended to achieved the exact same, but without adding the
>>> reserved regions to memblock.memory.
>>
>> I didn't consider adding reserved regions to memblock.memory as a clean
>> solution, this was still a hack, but I didn't think that things are that
>> fragile.
>>
>> I still think we cannot rely on memblock.reserved to detect
>> memory/zone/node sizes and the boot failure reported here confirms this.
>>  
>>> The comment on that problematic area says the reserved area cannot be
>>> used for DMA because of some unexplained hw issue, and that doing so
>>> prevents booting, but since the area got reserved, even with the clean
>>> solution, it shouldn't have never been used for DMA?
>>>
>>> So I can only imagine that the physical memory region is way more
>>> problematic than just for DMA. It sounds like that anything that
>>> touches it, including the CPU, will hang the system, not just DMA. It
>>> sounds somewhat similar to the other e820 direct mapping issue on x86?
>>
>> My understanding is that the boot failed because when I implicitly added
>> the reserved region to memblock.memory the memory size seen by
>> free_area_init() jumped from 2G to 4G because the reserved area was close
>> to 4G. The very first allocation would get a chunk from slightly below of
>> 4G and as there is no real memory there, the kernel would crash.
>>  
>>> If you want to test the hack on the arm board to check if it boots you
>>> can use the below commit:
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?id=c3ea2633015104ce0df33dcddbc36f57de1392bc
>>
>> My take is your solution would boot with this memory configuration, but I
>> still don't think that using memblock.reserved for zone/node sizing is
>> correct.
> 
> The rk3288 platform has now been failing to boot for nearly a
> month on linux-next:
> 
>   https://kernelci.org/test/case/id/5ffbed0a31ad81239bc94cdb/
> 
> Until a fix or a new version of this patch is made, would it be
> possible to drop it or revert it so the platform become usable
> again?
> 
> Or if you want, I can make a cleaned-up version of my hack to
> ignore the problematic region if you still need your patch to be
> on linux-next, but that would probably be less than ideal.

By the way, another bisection found that this commit is also
breaking tegra124-nyan-big but only with both CONFIG_EFI=y
CONFIG_ARM_LPAE=y enabled:

  https://kernelci.org/test/case/id/5ff6b1e26cf19f3b10c94cc5/

The plain multi_v7_defconfig is booting fine:

  https://kernelci.org/test/plan/id/5ff6b0a1db91b8a2b9c94cba/

I haven't looked into this one or tried to make it boot like
rk3288, but please let me know if there's anything there that can
be done to help.

Thanks,
Guillaume


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernelci/staging-next bisection: sleep.login on rk3288-rock2-square #2286-staging
  2021-01-12 10:53             ` Guillaume Tucker
  2021-01-12 11:10               ` Guillaume Tucker
@ 2021-01-12 12:06               ` Mike Rapoport
  1 sibling, 0 replies; 9+ messages in thread
From: Mike Rapoport @ 2021-01-12 12:06 UTC (permalink / raw)
  To: Guillaume Tucker
  Cc: Andrea Arcangeli, Andrew Morton, Stephen Rothwell,
	kernelci-results-staging, kernelci-results, linux-mm,
	linux-kernel, Mike Rapoport, Baoquan He

On Tue, Jan 12, 2021 at 10:53:45AM +0000, Guillaume Tucker wrote:
> On 05/01/2021 09:13, Mike Rapoport wrote:
> > On Sun, Jan 03, 2021 at 03:09:14PM -0500, Andrea Arcangeli wrote:
> >> Hello Mike,
> >>
> >> On Sun, Jan 03, 2021 at 03:47:53PM +0200, Mike Rapoport wrote:
> >>> Thanks for the logs, it seems that implicitly adding reserved regions to
> >>> memblock.memory wasn't that bright idea :)
> >>
> >> Would it be possible to somehow clean up the hack then?
> >>
> >> The only difference between the clean solution and the hack is that
> >> the hack intended to achieved the exact same, but without adding the
> >> reserved regions to memblock.memory.
> > 
> > I didn't consider adding reserved regions to memblock.memory as a clean
> > solution, this was still a hack, but I didn't think that things are that
> > fragile.
> > 
> > I still think we cannot rely on memblock.reserved to detect
> > memory/zone/node sizes and the boot failure reported here confirms this.
> >  
> >> The comment on that problematic area says the reserved area cannot be
> >> used for DMA because of some unexplained hw issue, and that doing so
> >> prevents booting, but since the area got reserved, even with the clean
> >> solution, it shouldn't have never been used for DMA?
> >>
> >> So I can only imagine that the physical memory region is way more
> >> problematic than just for DMA. It sounds like that anything that
> >> touches it, including the CPU, will hang the system, not just DMA. It
> >> sounds somewhat similar to the other e820 direct mapping issue on x86?
> > 
> > My understanding is that the boot failed because when I implicitly added
> > the reserved region to memblock.memory the memory size seen by
> > free_area_init() jumped from 2G to 4G because the reserved area was close
> > to 4G. The very first allocation would get a chunk from slightly below of
> > 4G and as there is no real memory there, the kernel would crash.
> >  
> >> If you want to test the hack on the arm board to check if it boots you
> >> can use the below commit:
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?id=c3ea2633015104ce0df33dcddbc36f57de1392bc
> > 
> > My take is your solution would boot with this memory configuration, but I
> > still don't think that using memblock.reserved for zone/node sizing is
> > correct.
> 
> The rk3288 platform has now been failing to boot for nearly a
> month on linux-next:
> 
>   https://kernelci.org/test/case/id/5ffbed0a31ad81239bc94cdb/
> 
> Until a fix or a new version of this patch is made, would it be
> possible to drop it or revert it so the platform become usable
> again?

There is a new version of these patches:

https://lore.kernel.org/lkml/20210111194017.22696-1-rppt@kernel.org

It's going to be in linux-next as soon as Andrew pushes mmotm.
 
> Or if you want, I can make a cleaned-up version of my hack to
> ignore the problematic region if you still need your patch to be
> on linux-next, but that would probably be less than ideal.
> 
> Thanks,
> Guillaume

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-01-12 12:07 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <5fd3e5d9.1c69fb81.f9e69.5028@mx.google.com>
2020-12-11 21:53 ` kernelci/staging-next bisection: sleep.login on rk3288-rock2-square #2286-staging Guillaume Tucker
2020-12-13  8:23   ` Mike Rapoport
2020-12-18 21:59     ` Guillaume Tucker
2021-01-03 13:47       ` Mike Rapoport
2021-01-03 20:09         ` Andrea Arcangeli
2021-01-05  9:13           ` Mike Rapoport
2021-01-12 10:53             ` Guillaume Tucker
2021-01-12 11:10               ` Guillaume Tucker
2021-01-12 12:06               ` Mike Rapoport

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).