All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
       [not found]         ` <53A2BE94.2010308@gmail.com>
@ 2014-06-23  3:56             ` Tushar Behera
  0 siblings, 0 replies; 32+ messages in thread
From: Tushar Behera @ 2014-06-23  3:56 UTC (permalink / raw)
  To: Kevin Hilman, Laura Abbott
  Cc: Sachin Kamat, kernel-build-reports, linaro-kernel, Russell King,
	linux-samsung-soc, linux-arm-kernel

Adding linux-samsung-soc and linux-arm-kernel ML for wider audience.

On 06/19/2014 04:12 PM, Tushar Behera wrote:
> On 06/19/2014 03:02 PM, Tushar Behera wrote:
>> On 06/18/2014 09:22 AM, Kevin Hilman wrote:
>>> On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera <trblinux@gmail.com> wrote:
>>>> On 06/17/2014 10:23 PM, Kevin Hilman wrote:
>>>>> Sachin,
>>>>>
>>>>> On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot <khilman@linaro.org> wrote:
>>>>>>
>>>>>> Tree/Branch: mainline
>>>>>> Git describe: v3.16-rc1-2-gebe0618
>>>>>> Failed boot tests (console logs at the end)
>>>>>> ===========================================
>>>>>>      exynos5420-arndale-octa:     FAIL:    arm-exynos_defconfig
>>>>>>                 ste-snowball:     FAIL:    arm-u8500_defconfig
>>>>>
>>>>> FYI... these failures are getting more consistent on my octa board,
>>>>> but still not failing every time.
>>>>>
>>>>> Kevin
>>>>>
>>>>
>>>> Hi Kevin,
>>>>
>>>> Same here.
>>>>
>>>> Observation: If you soft-reset the board (through the jumpers) after
>>>> getting this problem, the problem keeps repeating. But if you hard-reset
>>>> the board (by removing the power cord), the problem doesn't occur during
>>>> next iteration.
>>>
>>> I don't ever use the soft-reset, I only toggle the wall power.  I
>>> don't ever actually remove the power cord though, I'm using a
>>> USB-controlled relay to toggle the wall power.
>>>
>>> Kevin
>>>
>>
>> Laura,
>>
>> We are getting following kernel panic [1] (not always, but quite
>> regularly) while booting Arndale-Octa (based on Samsung's Exynos5420)
>> board with upstream kernel. I haven't observed this issue with other
>> boards yet.
>>
>> This issue is observed when I am booting with uImage + dtb (within
>> roughly ~10 iterations).
>>
> 
> Some more information:
> 
> The boot logs are provided in pastebin, okay[2] and failed[3].
> 
> In case of boot failures, I am getting a higher value for vm_total_pages
> (684424 in [3]). In case of successful boot on my board, it is always
> 521232 [2] on my board.
> 
> [2] http://pastebin.com/1iLaizuL
> [3] http://pastebin.com/5tdDt4GL
> 
>> There is no issue when I am booting appended zImage (zImage+dtb). I
>> tried running it over 200 cycles, but without any failure.
>>
>> 'git bisect' points to this commit.
>> commit 1c2f87c22566 "ARM: 8025/1: Get rid of meminfo"
>>
>> Reverting this commit on top of v3.16-rc1-17-ge99cfa2, I tested for
>> around 100 iterations of booting with uImage+dtb, without any failure.
>>
>> [1] Kernel log
>> Unhandled fault: external abort on non-linefetch (0x008) at 0xffc00000
>> Internal error: : 8 [#1] PREEMPT SMP ARM
>> Modules linked in:
>> CPU: 0 PID: 1136 Comm: kworker/u16:0 Not tainted
>> 3.15.0-rc1-00027-g1c8c3cf-dirty #5
>> task: ed0f5800 ti: eda52000 task.ti: eda52000
>> PC is at __copy_to_user_std+0x4c/0x3a8
>> LR is at copy_page_to_iter+0xb0/0x26c
>> pc : [<c01b858c>]    lr : [<c00982c0>]    psr: 60000113
>> sp : eda53de4  ip : 00000000  fp : ee103040
>> r10: ed9fb700  r9 : 00000080  r8 : eda53eb8
>> r7 : ffc00000  r6 : 00000000  r5 : 00000080  r4 : eda53e78
>> r3 : 00000000  r2 : 00000000  r1 : ffc00000  r0 : ed9fb700
>> Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
>> Control: 10c5387d  Table: 2000406a  DAC: 00000015
>> Process kworker/u16:0 (pid: 1136, stack limit = 0xeda52240)
>>
> 
> 


-- 
Tushar Behera

^ permalink raw reply	[flat|nested] 32+ messages in thread

* mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
@ 2014-06-23  3:56             ` Tushar Behera
  0 siblings, 0 replies; 32+ messages in thread
From: Tushar Behera @ 2014-06-23  3:56 UTC (permalink / raw)
  To: linux-arm-kernel

Adding linux-samsung-soc and linux-arm-kernel ML for wider audience.

On 06/19/2014 04:12 PM, Tushar Behera wrote:
> On 06/19/2014 03:02 PM, Tushar Behera wrote:
>> On 06/18/2014 09:22 AM, Kevin Hilman wrote:
>>> On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera <trblinux@gmail.com> wrote:
>>>> On 06/17/2014 10:23 PM, Kevin Hilman wrote:
>>>>> Sachin,
>>>>>
>>>>> On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot <khilman@linaro.org> wrote:
>>>>>>
>>>>>> Tree/Branch: mainline
>>>>>> Git describe: v3.16-rc1-2-gebe0618
>>>>>> Failed boot tests (console logs at the end)
>>>>>> ===========================================
>>>>>>      exynos5420-arndale-octa:     FAIL:    arm-exynos_defconfig
>>>>>>                 ste-snowball:     FAIL:    arm-u8500_defconfig
>>>>>
>>>>> FYI... these failures are getting more consistent on my octa board,
>>>>> but still not failing every time.
>>>>>
>>>>> Kevin
>>>>>
>>>>
>>>> Hi Kevin,
>>>>
>>>> Same here.
>>>>
>>>> Observation: If you soft-reset the board (through the jumpers) after
>>>> getting this problem, the problem keeps repeating. But if you hard-reset
>>>> the board (by removing the power cord), the problem doesn't occur during
>>>> next iteration.
>>>
>>> I don't ever use the soft-reset, I only toggle the wall power.  I
>>> don't ever actually remove the power cord though, I'm using a
>>> USB-controlled relay to toggle the wall power.
>>>
>>> Kevin
>>>
>>
>> Laura,
>>
>> We are getting following kernel panic [1] (not always, but quite
>> regularly) while booting Arndale-Octa (based on Samsung's Exynos5420)
>> board with upstream kernel. I haven't observed this issue with other
>> boards yet.
>>
>> This issue is observed when I am booting with uImage + dtb (within
>> roughly ~10 iterations).
>>
> 
> Some more information:
> 
> The boot logs are provided in pastebin, okay[2] and failed[3].
> 
> In case of boot failures, I am getting a higher value for vm_total_pages
> (684424 in [3]). In case of successful boot on my board, it is always
> 521232 [2] on my board.
> 
> [2] http://pastebin.com/1iLaizuL
> [3] http://pastebin.com/5tdDt4GL
> 
>> There is no issue when I am booting appended zImage (zImage+dtb). I
>> tried running it over 200 cycles, but without any failure.
>>
>> 'git bisect' points to this commit.
>> commit 1c2f87c22566 "ARM: 8025/1: Get rid of meminfo"
>>
>> Reverting this commit on top of v3.16-rc1-17-ge99cfa2, I tested for
>> around 100 iterations of booting with uImage+dtb, without any failure.
>>
>> [1] Kernel log
>> Unhandled fault: external abort on non-linefetch (0x008) at 0xffc00000
>> Internal error: : 8 [#1] PREEMPT SMP ARM
>> Modules linked in:
>> CPU: 0 PID: 1136 Comm: kworker/u16:0 Not tainted
>> 3.15.0-rc1-00027-g1c8c3cf-dirty #5
>> task: ed0f5800 ti: eda52000 task.ti: eda52000
>> PC is at __copy_to_user_std+0x4c/0x3a8
>> LR is at copy_page_to_iter+0xb0/0x26c
>> pc : [<c01b858c>]    lr : [<c00982c0>]    psr: 60000113
>> sp : eda53de4  ip : 00000000  fp : ee103040
>> r10: ed9fb700  r9 : 00000080  r8 : eda53eb8
>> r7 : ffc00000  r6 : 00000000  r5 : 00000080  r4 : eda53e78
>> r3 : 00000000  r2 : 00000000  r1 : ffc00000  r0 : ed9fb700
>> Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
>> Control: 10c5387d  Table: 2000406a  DAC: 00000015
>> Process kworker/u16:0 (pid: 1136, stack limit = 0xeda52240)
>>
> 
> 


-- 
Tushar Behera

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
  2014-06-23  3:56             ` Tushar Behera
@ 2014-06-23 18:32               ` Kevin Hilman
  -1 siblings, 0 replies; 32+ messages in thread
From: Kevin Hilman @ 2014-06-23 18:32 UTC (permalink / raw)
  To: Tushar Behera
  Cc: Laura Abbott, Sachin Kamat, kernel-build-reports, linaro-kernel,
	Russell King, linux-samsung-soc, linux-arm-kernel

On Sun, Jun 22, 2014 at 8:56 PM, Tushar Behera <trblinux@gmail.com> wrote:
> Adding linux-samsung-soc and linux-arm-kernel ML for wider audience.
>
> On 06/19/2014 04:12 PM, Tushar Behera wrote:
>> On 06/19/2014 03:02 PM, Tushar Behera wrote:
>>> On 06/18/2014 09:22 AM, Kevin Hilman wrote:
>>>> On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera <trblinux@gmail.com> wrote:
>>>>> On 06/17/2014 10:23 PM, Kevin Hilman wrote:
>>>>>> Sachin,
>>>>>>
>>>>>> On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot <khilman@linaro.org> wrote:
>>>>>>>
>>>>>>> Tree/Branch: mainline
>>>>>>> Git describe: v3.16-rc1-2-gebe0618
>>>>>>> Failed boot tests (console logs at the end)
>>>>>>> ===========================================
>>>>>>>      exynos5420-arndale-octa:     FAIL:    arm-exynos_defconfig
>>>>>>>                 ste-snowball:     FAIL:    arm-u8500_defconfig
>>>>>>
>>>>>> FYI... these failures are getting more consistent on my octa board,
>>>>>> but still not failing every time.
>>>>>>
>>>>>> Kevin
>>>>>>
>>>>>
>>>>> Hi Kevin,
>>>>>
>>>>> Same here.
>>>>>
>>>>> Observation: If you soft-reset the board (through the jumpers) after
>>>>> getting this problem, the problem keeps repeating. But if you hard-reset
>>>>> the board (by removing the power cord), the problem doesn't occur during
>>>>> next iteration.
>>>>
>>>> I don't ever use the soft-reset, I only toggle the wall power.  I
>>>> don't ever actually remove the power cord though, I'm using a
>>>> USB-controlled relay to toggle the wall power.
>>>>
>>>> Kevin
>>>>
>>>
>>> Laura,
>>>
>>> We are getting following kernel panic [1] (not always, but quite
>>> regularly) while booting Arndale-Octa (based on Samsung's Exynos5420)
>>> board with upstream kernel. I haven't observed this issue with other
>>> boards yet.
>>>
>>> This issue is observed when I am booting with uImage + dtb (within
>>> roughly ~10 iterations).
>>>
>>
>> Some more information:
>>
>> The boot logs are provided in pastebin, okay[2] and failed[3].
>>
>> In case of boot failures, I am getting a higher value for vm_total_pages
>> (684424 in [3]). In case of successful boot on my board, it is always
>> 521232 [2] on my board.

I can confirm that reverting the "Get rid of meminfo" patch gets the
Octa board booting reliably again for me also.

In case it helps, some boot logs for failures from the last copule
linux-next build/boot cycles can be seen here:
http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log

Kevin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
@ 2014-06-23 18:32               ` Kevin Hilman
  0 siblings, 0 replies; 32+ messages in thread
From: Kevin Hilman @ 2014-06-23 18:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Jun 22, 2014 at 8:56 PM, Tushar Behera <trblinux@gmail.com> wrote:
> Adding linux-samsung-soc and linux-arm-kernel ML for wider audience.
>
> On 06/19/2014 04:12 PM, Tushar Behera wrote:
>> On 06/19/2014 03:02 PM, Tushar Behera wrote:
>>> On 06/18/2014 09:22 AM, Kevin Hilman wrote:
>>>> On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera <trblinux@gmail.com> wrote:
>>>>> On 06/17/2014 10:23 PM, Kevin Hilman wrote:
>>>>>> Sachin,
>>>>>>
>>>>>> On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot <khilman@linaro.org> wrote:
>>>>>>>
>>>>>>> Tree/Branch: mainline
>>>>>>> Git describe: v3.16-rc1-2-gebe0618
>>>>>>> Failed boot tests (console logs at the end)
>>>>>>> ===========================================
>>>>>>>      exynos5420-arndale-octa:     FAIL:    arm-exynos_defconfig
>>>>>>>                 ste-snowball:     FAIL:    arm-u8500_defconfig
>>>>>>
>>>>>> FYI... these failures are getting more consistent on my octa board,
>>>>>> but still not failing every time.
>>>>>>
>>>>>> Kevin
>>>>>>
>>>>>
>>>>> Hi Kevin,
>>>>>
>>>>> Same here.
>>>>>
>>>>> Observation: If you soft-reset the board (through the jumpers) after
>>>>> getting this problem, the problem keeps repeating. But if you hard-reset
>>>>> the board (by removing the power cord), the problem doesn't occur during
>>>>> next iteration.
>>>>
>>>> I don't ever use the soft-reset, I only toggle the wall power.  I
>>>> don't ever actually remove the power cord though, I'm using a
>>>> USB-controlled relay to toggle the wall power.
>>>>
>>>> Kevin
>>>>
>>>
>>> Laura,
>>>
>>> We are getting following kernel panic [1] (not always, but quite
>>> regularly) while booting Arndale-Octa (based on Samsung's Exynos5420)
>>> board with upstream kernel. I haven't observed this issue with other
>>> boards yet.
>>>
>>> This issue is observed when I am booting with uImage + dtb (within
>>> roughly ~10 iterations).
>>>
>>
>> Some more information:
>>
>> The boot logs are provided in pastebin, okay[2] and failed[3].
>>
>> In case of boot failures, I am getting a higher value for vm_total_pages
>> (684424 in [3]). In case of successful boot on my board, it is always
>> 521232 [2] on my board.

I can confirm that reverting the "Get rid of meminfo" patch gets the
Octa board booting reliably again for me also.

In case it helps, some boot logs for failures from the last copule
linux-next build/boot cycles can be seen here:
http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log

Kevin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
  2014-06-23 18:32               ` Kevin Hilman
@ 2014-06-24 17:47                 ` Laura Abbott
  -1 siblings, 0 replies; 32+ messages in thread
From: Laura Abbott @ 2014-06-24 17:47 UTC (permalink / raw)
  To: Kevin Hilman, Tushar Behera
  Cc: linux-samsung-soc, kernel-build-reports, linaro-kernel,
	Russell King, Sachin Kamat, linux-arm-kernel

On 6/23/2014 11:32 AM, Kevin Hilman wrote:
> On Sun, Jun 22, 2014 at 8:56 PM, Tushar Behera <trblinux@gmail.com> wrote:
>> Adding linux-samsung-soc and linux-arm-kernel ML for wider audience.
>>
>> On 06/19/2014 04:12 PM, Tushar Behera wrote:
>>> On 06/19/2014 03:02 PM, Tushar Behera wrote:
>>>> On 06/18/2014 09:22 AM, Kevin Hilman wrote:
>>>>> On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera <trblinux@gmail.com> wrote:
>>>>>> On 06/17/2014 10:23 PM, Kevin Hilman wrote:
>>>>>>> Sachin,
>>>>>>>
>>>>>>> On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot <khilman@linaro.org> wrote:
>>>>>>>>
>>>>>>>> Tree/Branch: mainline
>>>>>>>> Git describe: v3.16-rc1-2-gebe0618
>>>>>>>> Failed boot tests (console logs at the end)
>>>>>>>> ===========================================
>>>>>>>>      exynos5420-arndale-octa:     FAIL:    arm-exynos_defconfig
>>>>>>>>                 ste-snowball:     FAIL:    arm-u8500_defconfig
>>>>>>>
>>>>>>> FYI... these failures are getting more consistent on my octa board,
>>>>>>> but still not failing every time.
>>>>>>>
>>>>>>> Kevin
>>>>>>>
>>>>>>
>>>>>> Hi Kevin,
>>>>>>
>>>>>> Same here.
>>>>>>
>>>>>> Observation: If you soft-reset the board (through the jumpers) after
>>>>>> getting this problem, the problem keeps repeating. But if you hard-reset
>>>>>> the board (by removing the power cord), the problem doesn't occur during
>>>>>> next iteration.
>>>>>
>>>>> I don't ever use the soft-reset, I only toggle the wall power.  I
>>>>> don't ever actually remove the power cord though, I'm using a
>>>>> USB-controlled relay to toggle the wall power.
>>>>>
>>>>> Kevin
>>>>>
>>>>
>>>> Laura,
>>>>
>>>> We are getting following kernel panic [1] (not always, but quite
>>>> regularly) while booting Arndale-Octa (based on Samsung's Exynos5420)
>>>> board with upstream kernel. I haven't observed this issue with other
>>>> boards yet.
>>>>
>>>> This issue is observed when I am booting with uImage + dtb (within
>>>> roughly ~10 iterations).
>>>>
>>>
>>> Some more information:
>>>
>>> The boot logs are provided in pastebin, okay[2] and failed[3].
>>>
>>> In case of boot failures, I am getting a higher value for vm_total_pages
>>> (684424 in [3]). In case of successful boot on my board, it is always
>>> 521232 [2] on my board.
> 
> I can confirm that reverting the "Get rid of meminfo" patch gets the
> Octa board booting reliably again for me also.
> 
> In case it helps, some boot logs for failures from the last copule
> linux-next build/boot cycles can be seen here:
> http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
> http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
> 

Sorry, I missed this yesterday. I'm going to take a look.

Thanks,
Laura

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 32+ messages in thread

* mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
@ 2014-06-24 17:47                 ` Laura Abbott
  0 siblings, 0 replies; 32+ messages in thread
From: Laura Abbott @ 2014-06-24 17:47 UTC (permalink / raw)
  To: linux-arm-kernel

On 6/23/2014 11:32 AM, Kevin Hilman wrote:
> On Sun, Jun 22, 2014 at 8:56 PM, Tushar Behera <trblinux@gmail.com> wrote:
>> Adding linux-samsung-soc and linux-arm-kernel ML for wider audience.
>>
>> On 06/19/2014 04:12 PM, Tushar Behera wrote:
>>> On 06/19/2014 03:02 PM, Tushar Behera wrote:
>>>> On 06/18/2014 09:22 AM, Kevin Hilman wrote:
>>>>> On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera <trblinux@gmail.com> wrote:
>>>>>> On 06/17/2014 10:23 PM, Kevin Hilman wrote:
>>>>>>> Sachin,
>>>>>>>
>>>>>>> On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot <khilman@linaro.org> wrote:
>>>>>>>>
>>>>>>>> Tree/Branch: mainline
>>>>>>>> Git describe: v3.16-rc1-2-gebe0618
>>>>>>>> Failed boot tests (console logs at the end)
>>>>>>>> ===========================================
>>>>>>>>      exynos5420-arndale-octa:     FAIL:    arm-exynos_defconfig
>>>>>>>>                 ste-snowball:     FAIL:    arm-u8500_defconfig
>>>>>>>
>>>>>>> FYI... these failures are getting more consistent on my octa board,
>>>>>>> but still not failing every time.
>>>>>>>
>>>>>>> Kevin
>>>>>>>
>>>>>>
>>>>>> Hi Kevin,
>>>>>>
>>>>>> Same here.
>>>>>>
>>>>>> Observation: If you soft-reset the board (through the jumpers) after
>>>>>> getting this problem, the problem keeps repeating. But if you hard-reset
>>>>>> the board (by removing the power cord), the problem doesn't occur during
>>>>>> next iteration.
>>>>>
>>>>> I don't ever use the soft-reset, I only toggle the wall power.  I
>>>>> don't ever actually remove the power cord though, I'm using a
>>>>> USB-controlled relay to toggle the wall power.
>>>>>
>>>>> Kevin
>>>>>
>>>>
>>>> Laura,
>>>>
>>>> We are getting following kernel panic [1] (not always, but quite
>>>> regularly) while booting Arndale-Octa (based on Samsung's Exynos5420)
>>>> board with upstream kernel. I haven't observed this issue with other
>>>> boards yet.
>>>>
>>>> This issue is observed when I am booting with uImage + dtb (within
>>>> roughly ~10 iterations).
>>>>
>>>
>>> Some more information:
>>>
>>> The boot logs are provided in pastebin, okay[2] and failed[3].
>>>
>>> In case of boot failures, I am getting a higher value for vm_total_pages
>>> (684424 in [3]). In case of successful boot on my board, it is always
>>> 521232 [2] on my board.
> 
> I can confirm that reverting the "Get rid of meminfo" patch gets the
> Octa board booting reliably again for me also.
> 
> In case it helps, some boot logs for failures from the last copule
> linux-next build/boot cycles can be seen here:
> http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
> http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
> 

Sorry, I missed this yesterday. I'm going to take a look.

Thanks,
Laura

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
  2014-06-24 17:47                 ` Laura Abbott
@ 2014-06-24 22:29                   ` Laura Abbott
  -1 siblings, 0 replies; 32+ messages in thread
From: Laura Abbott @ 2014-06-24 22:29 UTC (permalink / raw)
  To: Kevin Hilman, Tushar Behera
  Cc: linux-samsung-soc, Russell King, kernel-build-reports,
	linaro-kernel, linux-arm-kernel

On 6/24/2014 10:47 AM, Laura Abbott wrote:
> On 6/23/2014 11:32 AM, Kevin Hilman wrote:
>> On Sun, Jun 22, 2014 at 8:56 PM, Tushar Behera <trblinux@gmail.com> wrote:
>>> Adding linux-samsung-soc and linux-arm-kernel ML for wider audience.
>>>
>>> On 06/19/2014 04:12 PM, Tushar Behera wrote:
>>>> On 06/19/2014 03:02 PM, Tushar Behera wrote:
>>>>> On 06/18/2014 09:22 AM, Kevin Hilman wrote:
>>>>>> On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera <trblinux@gmail.com> wrote:
>>>>>>> On 06/17/2014 10:23 PM, Kevin Hilman wrote:
>>>>>>>> Sachin,
>>>>>>>>
>>>>>>>> On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot <khilman@linaro.org> wrote:
>>>>>>>>>
>>>>>>>>> Tree/Branch: mainline
>>>>>>>>> Git describe: v3.16-rc1-2-gebe0618
>>>>>>>>> Failed boot tests (console logs at the end)
>>>>>>>>> ===========================================
>>>>>>>>>      exynos5420-arndale-octa:     FAIL:    arm-exynos_defconfig
>>>>>>>>>                 ste-snowball:     FAIL:    arm-u8500_defconfig
>>>>>>>>
>>>>>>>> FYI... these failures are getting more consistent on my octa board,
>>>>>>>> but still not failing every time.
>>>>>>>>
>>>>>>>> Kevin
>>>>>>>>
>>>>>>>
>>>>>>> Hi Kevin,
>>>>>>>
>>>>>>> Same here.
>>>>>>>
>>>>>>> Observation: If you soft-reset the board (through the jumpers) after
>>>>>>> getting this problem, the problem keeps repeating. But if you hard-reset
>>>>>>> the board (by removing the power cord), the problem doesn't occur during
>>>>>>> next iteration.
>>>>>>
>>>>>> I don't ever use the soft-reset, I only toggle the wall power.  I
>>>>>> don't ever actually remove the power cord though, I'm using a
>>>>>> USB-controlled relay to toggle the wall power.
>>>>>>
>>>>>> Kevin
>>>>>>
>>>>>
>>>>> Laura,
>>>>>
>>>>> We are getting following kernel panic [1] (not always, but quite
>>>>> regularly) while booting Arndale-Octa (based on Samsung's Exynos5420)
>>>>> board with upstream kernel. I haven't observed this issue with other
>>>>> boards yet.
>>>>>
>>>>> This issue is observed when I am booting with uImage + dtb (within
>>>>> roughly ~10 iterations).
>>>>>
>>>>
>>>> Some more information:
>>>>
>>>> The boot logs are provided in pastebin, okay[2] and failed[3].
>>>>
>>>> In case of boot failures, I am getting a higher value for vm_total_pages
>>>> (684424 in [3]). In case of successful boot on my board, it is always
>>>> 521232 [2] on my board.
>>
>> I can confirm that reverting the "Get rid of meminfo" patch gets the
>> Octa board booting reliably again for me also.
>>
>> In case it helps, some boot logs for failures from the last copule
>> linux-next build/boot cycles can be seen here:
>> http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
>> http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
>>
> 
> Sorry, I missed this yesterday. I'm going to take a look.
> 

Were all of 

http://pastebin.com/1iLaizuL
http://pastebin.com/5tdDt4GL
http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log

collected on the same type of board with the same amount of DRAM? I'm seeing a
different amount of total pages across all those logs. All the logs have the
same lowmem limit so it seems like the upper bound was being calculated
incorrectly for passing to free_area_init_node. Nothing is immediately jumping
out at me so can you boot up with a small debug patch?

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 659c75d..88eac1f 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -187,6 +187,8 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low,
        unsigned long zone_size[MAX_NR_ZONES], zhole_size[MAX_NR_ZONES];
        struct memblock_region *reg;
 
+       pr_err("XXXXXXX min %lx max_low %lx max_high %lx\n", min, max_low, max_high);
+       __memblock_dump_all();
        /*
         * initialise the zones.
         */

It would be helpful to do this across a few bootups to see if the values are
actually consistent. I'll keep looking in the meantime.

Thanks,
Laura

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
@ 2014-06-24 22:29                   ` Laura Abbott
  0 siblings, 0 replies; 32+ messages in thread
From: Laura Abbott @ 2014-06-24 22:29 UTC (permalink / raw)
  To: linux-arm-kernel

On 6/24/2014 10:47 AM, Laura Abbott wrote:
> On 6/23/2014 11:32 AM, Kevin Hilman wrote:
>> On Sun, Jun 22, 2014 at 8:56 PM, Tushar Behera <trblinux@gmail.com> wrote:
>>> Adding linux-samsung-soc and linux-arm-kernel ML for wider audience.
>>>
>>> On 06/19/2014 04:12 PM, Tushar Behera wrote:
>>>> On 06/19/2014 03:02 PM, Tushar Behera wrote:
>>>>> On 06/18/2014 09:22 AM, Kevin Hilman wrote:
>>>>>> On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera <trblinux@gmail.com> wrote:
>>>>>>> On 06/17/2014 10:23 PM, Kevin Hilman wrote:
>>>>>>>> Sachin,
>>>>>>>>
>>>>>>>> On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot <khilman@linaro.org> wrote:
>>>>>>>>>
>>>>>>>>> Tree/Branch: mainline
>>>>>>>>> Git describe: v3.16-rc1-2-gebe0618
>>>>>>>>> Failed boot tests (console logs at the end)
>>>>>>>>> ===========================================
>>>>>>>>>      exynos5420-arndale-octa:     FAIL:    arm-exynos_defconfig
>>>>>>>>>                 ste-snowball:     FAIL:    arm-u8500_defconfig
>>>>>>>>
>>>>>>>> FYI... these failures are getting more consistent on my octa board,
>>>>>>>> but still not failing every time.
>>>>>>>>
>>>>>>>> Kevin
>>>>>>>>
>>>>>>>
>>>>>>> Hi Kevin,
>>>>>>>
>>>>>>> Same here.
>>>>>>>
>>>>>>> Observation: If you soft-reset the board (through the jumpers) after
>>>>>>> getting this problem, the problem keeps repeating. But if you hard-reset
>>>>>>> the board (by removing the power cord), the problem doesn't occur during
>>>>>>> next iteration.
>>>>>>
>>>>>> I don't ever use the soft-reset, I only toggle the wall power.  I
>>>>>> don't ever actually remove the power cord though, I'm using a
>>>>>> USB-controlled relay to toggle the wall power.
>>>>>>
>>>>>> Kevin
>>>>>>
>>>>>
>>>>> Laura,
>>>>>
>>>>> We are getting following kernel panic [1] (not always, but quite
>>>>> regularly) while booting Arndale-Octa (based on Samsung's Exynos5420)
>>>>> board with upstream kernel. I haven't observed this issue with other
>>>>> boards yet.
>>>>>
>>>>> This issue is observed when I am booting with uImage + dtb (within
>>>>> roughly ~10 iterations).
>>>>>
>>>>
>>>> Some more information:
>>>>
>>>> The boot logs are provided in pastebin, okay[2] and failed[3].
>>>>
>>>> In case of boot failures, I am getting a higher value for vm_total_pages
>>>> (684424 in [3]). In case of successful boot on my board, it is always
>>>> 521232 [2] on my board.
>>
>> I can confirm that reverting the "Get rid of meminfo" patch gets the
>> Octa board booting reliably again for me also.
>>
>> In case it helps, some boot logs for failures from the last copule
>> linux-next build/boot cycles can be seen here:
>> http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
>> http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
>>
> 
> Sorry, I missed this yesterday. I'm going to take a look.
> 

Were all of 

http://pastebin.com/1iLaizuL
http://pastebin.com/5tdDt4GL
http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log

collected on the same type of board with the same amount of DRAM? I'm seeing a
different amount of total pages across all those logs. All the logs have the
same lowmem limit so it seems like the upper bound was being calculated
incorrectly for passing to free_area_init_node. Nothing is immediately jumping
out at me so can you boot up with a small debug patch?

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 659c75d..88eac1f 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -187,6 +187,8 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low,
        unsigned long zone_size[MAX_NR_ZONES], zhole_size[MAX_NR_ZONES];
        struct memblock_region *reg;
 
+       pr_err("XXXXXXX min %lx max_low %lx max_high %lx\n", min, max_low, max_high);
+       __memblock_dump_all();
        /*
         * initialise the zones.
         */

It would be helpful to do this across a few bootups to see if the values are
actually consistent. I'll keep looking in the meantime.

Thanks,
Laura

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
  2014-06-23 18:32               ` Kevin Hilman
@ 2014-06-25  2:07                 ` Andreas Färber
  -1 siblings, 0 replies; 32+ messages in thread
From: Andreas Färber @ 2014-06-25  2:07 UTC (permalink / raw)
  To: Kevin Hilman, Tushar Behera, Laura Abbott
  Cc: Sachin Kamat, kernel-build-reports, linaro-kernel, Russell King,
	linux-samsung-soc, linux-arm-kernel

Am 23.06.2014 20:32, schrieb Kevin Hilman:
> On Sun, Jun 22, 2014 at 8:56 PM, Tushar Behera <trblinux@gmail.com> wrote:
>> Adding linux-samsung-soc and linux-arm-kernel ML for wider audience.
>>
>> On 06/19/2014 04:12 PM, Tushar Behera wrote:
>>> On 06/19/2014 03:02 PM, Tushar Behera wrote:
>>>> On 06/18/2014 09:22 AM, Kevin Hilman wrote:
>>>>> On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera <trblinux@gmail.com> wrote:
>>>>>> On 06/17/2014 10:23 PM, Kevin Hilman wrote:
>>>>>>> Sachin,
>>>>>>>
>>>>>>> On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot <khilman@linaro.org> wrote:
>>>>>>>>
>>>>>>>> Tree/Branch: mainline
>>>>>>>> Git describe: v3.16-rc1-2-gebe0618
>>>>>>>> Failed boot tests (console logs at the end)
>>>>>>>> ===========================================
>>>>>>>>      exynos5420-arndale-octa:     FAIL:    arm-exynos_defconfig
>>>>>>>>                 ste-snowball:     FAIL:    arm-u8500_defconfig
>>>>>>>
>>>>>>> FYI... these failures are getting more consistent on my octa board,
>>>>>>> but still not failing every time.
>>>>>>>
>>>>>>> Kevin
>>>>>>>
>>>>>>
>>>>>> Hi Kevin,
>>>>>>
>>>>>> Same here.
>>>>>>
>>>>>> Observation: If you soft-reset the board (through the jumpers) after
>>>>>> getting this problem, the problem keeps repeating. But if you hard-reset
>>>>>> the board (by removing the power cord), the problem doesn't occur during
>>>>>> next iteration.
>>>>>
>>>>> I don't ever use the soft-reset, I only toggle the wall power.  I
>>>>> don't ever actually remove the power cord though, I'm using a
>>>>> USB-controlled relay to toggle the wall power.
>>>>>
>>>>> Kevin
>>>>>
>>>>
>>>> Laura,
>>>>
>>>> We are getting following kernel panic [1] (not always, but quite
>>>> regularly) while booting Arndale-Octa (based on Samsung's Exynos5420)
>>>> board with upstream kernel. I haven't observed this issue with other
>>>> boards yet.
>>>>
>>>> This issue is observed when I am booting with uImage + dtb (within
>>>> roughly ~10 iterations).
>>>>
>>>
>>> Some more information:
>>>
>>> The boot logs are provided in pastebin, okay[2] and failed[3].
>>>
>>> In case of boot failures, I am getting a higher value for vm_total_pages
>>> (684424 in [3]). In case of successful boot on my board, it is always
>>> 521232 [2] on my board.
> 
> I can confirm that reverting the "Get rid of meminfo" patch gets the
> Octa board booting reliably again for me also.

Confirming that the revert [1] fixes also the issue I was reporting for
my Arndale Octa. I'm using zImage + dtb and had been resetting via J10.

Regards,
Andreas

[1] https://github.com/afaerber/linux/commits/arndale-octa-next

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 32+ messages in thread

* mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
@ 2014-06-25  2:07                 ` Andreas Färber
  0 siblings, 0 replies; 32+ messages in thread
From: Andreas Färber @ 2014-06-25  2:07 UTC (permalink / raw)
  To: linux-arm-kernel

Am 23.06.2014 20:32, schrieb Kevin Hilman:
> On Sun, Jun 22, 2014 at 8:56 PM, Tushar Behera <trblinux@gmail.com> wrote:
>> Adding linux-samsung-soc and linux-arm-kernel ML for wider audience.
>>
>> On 06/19/2014 04:12 PM, Tushar Behera wrote:
>>> On 06/19/2014 03:02 PM, Tushar Behera wrote:
>>>> On 06/18/2014 09:22 AM, Kevin Hilman wrote:
>>>>> On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera <trblinux@gmail.com> wrote:
>>>>>> On 06/17/2014 10:23 PM, Kevin Hilman wrote:
>>>>>>> Sachin,
>>>>>>>
>>>>>>> On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot <khilman@linaro.org> wrote:
>>>>>>>>
>>>>>>>> Tree/Branch: mainline
>>>>>>>> Git describe: v3.16-rc1-2-gebe0618
>>>>>>>> Failed boot tests (console logs at the end)
>>>>>>>> ===========================================
>>>>>>>>      exynos5420-arndale-octa:     FAIL:    arm-exynos_defconfig
>>>>>>>>                 ste-snowball:     FAIL:    arm-u8500_defconfig
>>>>>>>
>>>>>>> FYI... these failures are getting more consistent on my octa board,
>>>>>>> but still not failing every time.
>>>>>>>
>>>>>>> Kevin
>>>>>>>
>>>>>>
>>>>>> Hi Kevin,
>>>>>>
>>>>>> Same here.
>>>>>>
>>>>>> Observation: If you soft-reset the board (through the jumpers) after
>>>>>> getting this problem, the problem keeps repeating. But if you hard-reset
>>>>>> the board (by removing the power cord), the problem doesn't occur during
>>>>>> next iteration.
>>>>>
>>>>> I don't ever use the soft-reset, I only toggle the wall power.  I
>>>>> don't ever actually remove the power cord though, I'm using a
>>>>> USB-controlled relay to toggle the wall power.
>>>>>
>>>>> Kevin
>>>>>
>>>>
>>>> Laura,
>>>>
>>>> We are getting following kernel panic [1] (not always, but quite
>>>> regularly) while booting Arndale-Octa (based on Samsung's Exynos5420)
>>>> board with upstream kernel. I haven't observed this issue with other
>>>> boards yet.
>>>>
>>>> This issue is observed when I am booting with uImage + dtb (within
>>>> roughly ~10 iterations).
>>>>
>>>
>>> Some more information:
>>>
>>> The boot logs are provided in pastebin, okay[2] and failed[3].
>>>
>>> In case of boot failures, I am getting a higher value for vm_total_pages
>>> (684424 in [3]). In case of successful boot on my board, it is always
>>> 521232 [2] on my board.
> 
> I can confirm that reverting the "Get rid of meminfo" patch gets the
> Octa board booting reliably again for me also.

Confirming that the revert [1] fixes also the issue I was reporting for
my Arndale Octa. I'm using zImage + dtb and had been resetting via J10.

Regards,
Andreas

[1] https://github.com/afaerber/linux/commits/arndale-octa-next

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N?rnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imend?rffer; HRB 16746 AG N?rnberg

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
  2014-06-24 22:29                   ` Laura Abbott
@ 2014-06-25 12:13                     ` Tushar Behera
  -1 siblings, 0 replies; 32+ messages in thread
From: Tushar Behera @ 2014-06-25 12:13 UTC (permalink / raw)
  To: Laura Abbott, Kevin Hilman
  Cc: linux-samsung-soc, Russell King, kernel-build-reports,
	linaro-kernel, linux-arm-kernel

On 06/25/2014 03:59 AM, Laura Abbott wrote:
> On 6/24/2014 10:47 AM, Laura Abbott wrote:
>> On 6/23/2014 11:32 AM, Kevin Hilman wrote:
>>> On Sun, Jun 22, 2014 at 8:56 PM, Tushar Behera <trblinux@gmail.com> wrote:
>>>> Adding linux-samsung-soc and linux-arm-kernel ML for wider audience.
>>>>
>>>> On 06/19/2014 04:12 PM, Tushar Behera wrote:
>>>>> On 06/19/2014 03:02 PM, Tushar Behera wrote:
>>>>>> On 06/18/2014 09:22 AM, Kevin Hilman wrote:
>>>>>>> On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera <trblinux@gmail.com> wrote:
>>>>>>>> On 06/17/2014 10:23 PM, Kevin Hilman wrote:
>>>>>>>>> Sachin,
>>>>>>>>>
>>>>>>>>> On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot <khilman@linaro.org> wrote:
>>>>>>>>>>
>>>>>>>>>> Tree/Branch: mainline
>>>>>>>>>> Git describe: v3.16-rc1-2-gebe0618
>>>>>>>>>> Failed boot tests (console logs at the end)
>>>>>>>>>> ===========================================
>>>>>>>>>>      exynos5420-arndale-octa:     FAIL:    arm-exynos_defconfig
>>>>>>>>>>                 ste-snowball:     FAIL:    arm-u8500_defconfig
>>>>>>>>>
>>>>>>>>> FYI... these failures are getting more consistent on my octa board,
>>>>>>>>> but still not failing every time.
>>>>>>>>>
>>>>>>>>> Kevin
>>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Kevin,
>>>>>>>>
>>>>>>>> Same here.
>>>>>>>>
>>>>>>>> Observation: If you soft-reset the board (through the jumpers) after
>>>>>>>> getting this problem, the problem keeps repeating. But if you hard-reset
>>>>>>>> the board (by removing the power cord), the problem doesn't occur during
>>>>>>>> next iteration.
>>>>>>>
>>>>>>> I don't ever use the soft-reset, I only toggle the wall power.  I
>>>>>>> don't ever actually remove the power cord though, I'm using a
>>>>>>> USB-controlled relay to toggle the wall power.
>>>>>>>
>>>>>>> Kevin
>>>>>>>
>>>>>>
>>>>>> Laura,
>>>>>>
>>>>>> We are getting following kernel panic [1] (not always, but quite
>>>>>> regularly) while booting Arndale-Octa (based on Samsung's Exynos5420)
>>>>>> board with upstream kernel. I haven't observed this issue with other
>>>>>> boards yet.
>>>>>>
>>>>>> This issue is observed when I am booting with uImage + dtb (within
>>>>>> roughly ~10 iterations).
>>>>>>
>>>>>
>>>>> Some more information:
>>>>>
>>>>> The boot logs are provided in pastebin, okay[2] and failed[3].
>>>>>
>>>>> In case of boot failures, I am getting a higher value for vm_total_pages
>>>>> (684424 in [3]). In case of successful boot on my board, it is always
>>>>> 521232 [2] on my board.
>>>
>>> I can confirm that reverting the "Get rid of meminfo" patch gets the
>>> Octa board booting reliably again for me also.
>>>
>>> In case it helps, some boot logs for failures from the last copule
>>> linux-next build/boot cycles can be seen here:
>>> http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
>>> http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
>>>
>>
>> Sorry, I missed this yesterday. I'm going to take a look.
>>
> 
> Were all of 
> 
> http://pastebin.com/1iLaizuL
> http://pastebin.com/5tdDt4GL
> http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
> http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
> 
> collected on the same type of board with the same amount of DRAM? I'm seeing a
> different amount of total pages across all those logs. All the logs have the
> same lowmem limit so it seems like the upper bound was being calculated
> incorrectly for passing to free_area_init_node. Nothing is immediately jumping
> out at me so can you boot up with a small debug patch?
> 
> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> index 659c75d..88eac1f 100644
> --- a/arch/arm/mm/init.c
> +++ b/arch/arm/mm/init.c
> @@ -187,6 +187,8 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low,
>         unsigned long zone_size[MAX_NR_ZONES], zhole_size[MAX_NR_ZONES];
>         struct memblock_region *reg;
>  
> +       pr_err("XXXXXXX min %lx max_low %lx max_high %lx\n", min, max_low, max_high);
> +       __memblock_dump_all();
>         /*
>          * initialise the zones.
>          */
> 
> It would be helpful to do this across a few bootups to see if the values are
> actually consistent. I'll keep looking in the meantime.
> 
> Thanks,
> Laura
> 

Thanks Laura for the pointer. In case of error, I am getting some random
memblock_add() calls from drivers/of/fdt.c:early_init_dt_scan_memory.

The issue seems to be from u-boot, where it is not updating the memory
subnode properly. I have got a fix for the u-boot, which I am testing
right now. I will update tomorrow after I do some more test.

Additional changes in kernel.
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index c4cddf0..bca82b3 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -817,7 +817,7 @@ int __init early_init_dt_scan_memory(unsigned long
node, const char *uname,

        endp = reg + (l / sizeof(__be32));

-       pr_debug("memory scan node %s, reg size %d, data: %x %x %x %x,\n",
+       pr_err("memory scan node %s, reg size %d, data: %x %x %x %x,\n",
            uname, l, reg[0], reg[1], reg[2], reg[3]);

        while ((endp - reg) >= (dt_root_addr_cells + dt_root_size_cells)) {
@@ -891,6 +891,7 @@ void __init __weak early_init_dt_add_memory_arch(u64
base, u64 size)
                size -= phys_offset - base;
                base = phys_offset;
        }
+       printk("trb: memblock_add base (%llx) size(%llx)\n", base, size);
        memblock_add(base, size);
 }


Kernel log:

memory scan node memory, reg size 96, data: 20 10 30 10,
trb: memblock_add base (20000000) size(10000000)
trb: memblock_add base (30000000) size(10000000)
trb: memblock_add base (40000000) size(10000000)
trb: memblock_add base (50000000) size(10000000)
trb: memblock_add base (60000000) size(10000000)
trb: memblock_add base (70000000) size(10000000)
trb: memblock_add base (80000000) size(10000000)
trb: memblock_add base (90000000) size(fa00000)
trb: memblock_add base (fffff000) size(fffff000)
trb: memblock_add base (ffeff000) size(fffff000)
trb: memblock_add base (fbfff000) size(fffff000)
trb: memblock_add base (fffff000) size(effff000)
Machine model: Insignal Arndale Octa evaluation board based on EXYNOS5420
bootconsole [earlycon0] enabled
Memory policy: Data cache writealloc
XXXXXXX min 20000 max_low 4f800 max_high fffff
MEMBLOCK configuration:
 memory size = 0x82a00fff reserved size = 0x75e947
 memory.cnt  = 0x4
 memory[0x0]     [0x00000020000000-0x00000042ffffff], 0x23000000 bytes
flags: 0x0
 memory[0x1]     [0x00000043800000-0x00000050ffffff], 0xd800000 bytes
flags: 0x0
 memory[0x2]     [0x00000051800000-0x0000009f9fffff], 0x4e200000 bytes
flags: 0x0
 memory[0x3]     [0x000000fbfff000-0x000000fffffffe], 0x4000fff bytes
flags: 0x0
 reserved.cnt  = 0x6
 reserved[0x0]   [0x00000020004000-0x00000020007fff], 0x4000 bytes
flags: 0x0
 reserved[0x1]   [0x000000200082c0-0x0000002059cb7f], 0x5948c0 bytes
flags: 0x0
 reserved[0x2]   [0x0000002fe45000-0x0000002fe4fea7], 0xaea8 bytes
flags: 0x0
 reserved[0x3]   [0x0000002fe50000-0x0000002ffff09e], 0x1af09f bytes
flags: 0x0
 reserved[0x4]   [0x0000004f7f3000-0x0000004f7fbfff], 0x9000 bytes
flags: 0x0
 reserved[0x5]   [0x0000004f7fcec0-0x0000004f7fffff], 0x3140 bytes
flags: 0x0


-- 
Tushar Behera

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
@ 2014-06-25 12:13                     ` Tushar Behera
  0 siblings, 0 replies; 32+ messages in thread
From: Tushar Behera @ 2014-06-25 12:13 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/25/2014 03:59 AM, Laura Abbott wrote:
> On 6/24/2014 10:47 AM, Laura Abbott wrote:
>> On 6/23/2014 11:32 AM, Kevin Hilman wrote:
>>> On Sun, Jun 22, 2014 at 8:56 PM, Tushar Behera <trblinux@gmail.com> wrote:
>>>> Adding linux-samsung-soc and linux-arm-kernel ML for wider audience.
>>>>
>>>> On 06/19/2014 04:12 PM, Tushar Behera wrote:
>>>>> On 06/19/2014 03:02 PM, Tushar Behera wrote:
>>>>>> On 06/18/2014 09:22 AM, Kevin Hilman wrote:
>>>>>>> On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera <trblinux@gmail.com> wrote:
>>>>>>>> On 06/17/2014 10:23 PM, Kevin Hilman wrote:
>>>>>>>>> Sachin,
>>>>>>>>>
>>>>>>>>> On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot <khilman@linaro.org> wrote:
>>>>>>>>>>
>>>>>>>>>> Tree/Branch: mainline
>>>>>>>>>> Git describe: v3.16-rc1-2-gebe0618
>>>>>>>>>> Failed boot tests (console logs at the end)
>>>>>>>>>> ===========================================
>>>>>>>>>>      exynos5420-arndale-octa:     FAIL:    arm-exynos_defconfig
>>>>>>>>>>                 ste-snowball:     FAIL:    arm-u8500_defconfig
>>>>>>>>>
>>>>>>>>> FYI... these failures are getting more consistent on my octa board,
>>>>>>>>> but still not failing every time.
>>>>>>>>>
>>>>>>>>> Kevin
>>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Kevin,
>>>>>>>>
>>>>>>>> Same here.
>>>>>>>>
>>>>>>>> Observation: If you soft-reset the board (through the jumpers) after
>>>>>>>> getting this problem, the problem keeps repeating. But if you hard-reset
>>>>>>>> the board (by removing the power cord), the problem doesn't occur during
>>>>>>>> next iteration.
>>>>>>>
>>>>>>> I don't ever use the soft-reset, I only toggle the wall power.  I
>>>>>>> don't ever actually remove the power cord though, I'm using a
>>>>>>> USB-controlled relay to toggle the wall power.
>>>>>>>
>>>>>>> Kevin
>>>>>>>
>>>>>>
>>>>>> Laura,
>>>>>>
>>>>>> We are getting following kernel panic [1] (not always, but quite
>>>>>> regularly) while booting Arndale-Octa (based on Samsung's Exynos5420)
>>>>>> board with upstream kernel. I haven't observed this issue with other
>>>>>> boards yet.
>>>>>>
>>>>>> This issue is observed when I am booting with uImage + dtb (within
>>>>>> roughly ~10 iterations).
>>>>>>
>>>>>
>>>>> Some more information:
>>>>>
>>>>> The boot logs are provided in pastebin, okay[2] and failed[3].
>>>>>
>>>>> In case of boot failures, I am getting a higher value for vm_total_pages
>>>>> (684424 in [3]). In case of successful boot on my board, it is always
>>>>> 521232 [2] on my board.
>>>
>>> I can confirm that reverting the "Get rid of meminfo" patch gets the
>>> Octa board booting reliably again for me also.
>>>
>>> In case it helps, some boot logs for failures from the last copule
>>> linux-next build/boot cycles can be seen here:
>>> http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
>>> http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
>>>
>>
>> Sorry, I missed this yesterday. I'm going to take a look.
>>
> 
> Were all of 
> 
> http://pastebin.com/1iLaizuL
> http://pastebin.com/5tdDt4GL
> http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
> http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
> 
> collected on the same type of board with the same amount of DRAM? I'm seeing a
> different amount of total pages across all those logs. All the logs have the
> same lowmem limit so it seems like the upper bound was being calculated
> incorrectly for passing to free_area_init_node. Nothing is immediately jumping
> out at me so can you boot up with a small debug patch?
> 
> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> index 659c75d..88eac1f 100644
> --- a/arch/arm/mm/init.c
> +++ b/arch/arm/mm/init.c
> @@ -187,6 +187,8 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low,
>         unsigned long zone_size[MAX_NR_ZONES], zhole_size[MAX_NR_ZONES];
>         struct memblock_region *reg;
>  
> +       pr_err("XXXXXXX min %lx max_low %lx max_high %lx\n", min, max_low, max_high);
> +       __memblock_dump_all();
>         /*
>          * initialise the zones.
>          */
> 
> It would be helpful to do this across a few bootups to see if the values are
> actually consistent. I'll keep looking in the meantime.
> 
> Thanks,
> Laura
> 

Thanks Laura for the pointer. In case of error, I am getting some random
memblock_add() calls from drivers/of/fdt.c:early_init_dt_scan_memory.

The issue seems to be from u-boot, where it is not updating the memory
subnode properly. I have got a fix for the u-boot, which I am testing
right now. I will update tomorrow after I do some more test.

Additional changes in kernel.
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index c4cddf0..bca82b3 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -817,7 +817,7 @@ int __init early_init_dt_scan_memory(unsigned long
node, const char *uname,

        endp = reg + (l / sizeof(__be32));

-       pr_debug("memory scan node %s, reg size %d, data: %x %x %x %x,\n",
+       pr_err("memory scan node %s, reg size %d, data: %x %x %x %x,\n",
            uname, l, reg[0], reg[1], reg[2], reg[3]);

        while ((endp - reg) >= (dt_root_addr_cells + dt_root_size_cells)) {
@@ -891,6 +891,7 @@ void __init __weak early_init_dt_add_memory_arch(u64
base, u64 size)
                size -= phys_offset - base;
                base = phys_offset;
        }
+       printk("trb: memblock_add base (%llx) size(%llx)\n", base, size);
        memblock_add(base, size);
 }


Kernel log:

memory scan node memory, reg size 96, data: 20 10 30 10,
trb: memblock_add base (20000000) size(10000000)
trb: memblock_add base (30000000) size(10000000)
trb: memblock_add base (40000000) size(10000000)
trb: memblock_add base (50000000) size(10000000)
trb: memblock_add base (60000000) size(10000000)
trb: memblock_add base (70000000) size(10000000)
trb: memblock_add base (80000000) size(10000000)
trb: memblock_add base (90000000) size(fa00000)
trb: memblock_add base (fffff000) size(fffff000)
trb: memblock_add base (ffeff000) size(fffff000)
trb: memblock_add base (fbfff000) size(fffff000)
trb: memblock_add base (fffff000) size(effff000)
Machine model: Insignal Arndale Octa evaluation board based on EXYNOS5420
bootconsole [earlycon0] enabled
Memory policy: Data cache writealloc
XXXXXXX min 20000 max_low 4f800 max_high fffff
MEMBLOCK configuration:
 memory size = 0x82a00fff reserved size = 0x75e947
 memory.cnt  = 0x4
 memory[0x0]     [0x00000020000000-0x00000042ffffff], 0x23000000 bytes
flags: 0x0
 memory[0x1]     [0x00000043800000-0x00000050ffffff], 0xd800000 bytes
flags: 0x0
 memory[0x2]     [0x00000051800000-0x0000009f9fffff], 0x4e200000 bytes
flags: 0x0
 memory[0x3]     [0x000000fbfff000-0x000000fffffffe], 0x4000fff bytes
flags: 0x0
 reserved.cnt  = 0x6
 reserved[0x0]   [0x00000020004000-0x00000020007fff], 0x4000 bytes
flags: 0x0
 reserved[0x1]   [0x000000200082c0-0x0000002059cb7f], 0x5948c0 bytes
flags: 0x0
 reserved[0x2]   [0x0000002fe45000-0x0000002fe4fea7], 0xaea8 bytes
flags: 0x0
 reserved[0x3]   [0x0000002fe50000-0x0000002ffff09e], 0x1af09f bytes
flags: 0x0
 reserved[0x4]   [0x0000004f7f3000-0x0000004f7fbfff], 0x9000 bytes
flags: 0x0
 reserved[0x5]   [0x0000004f7fcec0-0x0000004f7fffff], 0x3140 bytes
flags: 0x0


-- 
Tushar Behera

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
  2014-06-25 12:13                     ` Tushar Behera
@ 2014-06-25 21:57                       ` Laura Abbott
  -1 siblings, 0 replies; 32+ messages in thread
From: Laura Abbott @ 2014-06-25 21:57 UTC (permalink / raw)
  To: Tushar Behera, Kevin Hilman
  Cc: linux-samsung-soc, Russell King, kernel-build-reports,
	linaro-kernel, linux-arm-kernel

On 6/25/2014 5:13 AM, Tushar Behera wrote:
> On 06/25/2014 03:59 AM, Laura Abbott wrote:
>> On 6/24/2014 10:47 AM, Laura Abbott wrote:
>>> On 6/23/2014 11:32 AM, Kevin Hilman wrote:
>>>> On Sun, Jun 22, 2014 at 8:56 PM, Tushar Behera <trblinux@gmail.com> wrote:
>>>>> Adding linux-samsung-soc and linux-arm-kernel ML for wider audience.
>>>>>
>>>>> On 06/19/2014 04:12 PM, Tushar Behera wrote:
>>>>>> On 06/19/2014 03:02 PM, Tushar Behera wrote:
>>>>>>> On 06/18/2014 09:22 AM, Kevin Hilman wrote:
>>>>>>>> On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera <trblinux@gmail.com> wrote:
>>>>>>>>> On 06/17/2014 10:23 PM, Kevin Hilman wrote:
>>>>>>>>>> Sachin,
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot <khilman@linaro.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Tree/Branch: mainline
>>>>>>>>>>> Git describe: v3.16-rc1-2-gebe0618
>>>>>>>>>>> Failed boot tests (console logs at the end)
>>>>>>>>>>> ===========================================
>>>>>>>>>>>      exynos5420-arndale-octa:     FAIL:    arm-exynos_defconfig
>>>>>>>>>>>                 ste-snowball:     FAIL:    arm-u8500_defconfig
>>>>>>>>>>
>>>>>>>>>> FYI... these failures are getting more consistent on my octa board,
>>>>>>>>>> but still not failing every time.
>>>>>>>>>>
>>>>>>>>>> Kevin
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Kevin,
>>>>>>>>>
>>>>>>>>> Same here.
>>>>>>>>>
>>>>>>>>> Observation: If you soft-reset the board (through the jumpers) after
>>>>>>>>> getting this problem, the problem keeps repeating. But if you hard-reset
>>>>>>>>> the board (by removing the power cord), the problem doesn't occur during
>>>>>>>>> next iteration.
>>>>>>>>
>>>>>>>> I don't ever use the soft-reset, I only toggle the wall power.  I
>>>>>>>> don't ever actually remove the power cord though, I'm using a
>>>>>>>> USB-controlled relay to toggle the wall power.
>>>>>>>>
>>>>>>>> Kevin
>>>>>>>>
>>>>>>>
>>>>>>> Laura,
>>>>>>>
>>>>>>> We are getting following kernel panic [1] (not always, but quite
>>>>>>> regularly) while booting Arndale-Octa (based on Samsung's Exynos5420)
>>>>>>> board with upstream kernel. I haven't observed this issue with other
>>>>>>> boards yet.
>>>>>>>
>>>>>>> This issue is observed when I am booting with uImage + dtb (within
>>>>>>> roughly ~10 iterations).
>>>>>>>
>>>>>>
>>>>>> Some more information:
>>>>>>
>>>>>> The boot logs are provided in pastebin, okay[2] and failed[3].
>>>>>>
>>>>>> In case of boot failures, I am getting a higher value for vm_total_pages
>>>>>> (684424 in [3]). In case of successful boot on my board, it is always
>>>>>> 521232 [2] on my board.
>>>>
>>>> I can confirm that reverting the "Get rid of meminfo" patch gets the
>>>> Octa board booting reliably again for me also.
>>>>
>>>> In case it helps, some boot logs for failures from the last copule
>>>> linux-next build/boot cycles can be seen here:
>>>> http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
>>>> http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
>>>>
>>>
>>> Sorry, I missed this yesterday. I'm going to take a look.
>>>
>>
>> Were all of 
>>
>> http://pastebin.com/1iLaizuL
>> http://pastebin.com/5tdDt4GL
>> http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
>> http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
>>
>> collected on the same type of board with the same amount of DRAM? I'm seeing a
>> different amount of total pages across all those logs. All the logs have the
>> same lowmem limit so it seems like the upper bound was being calculated
>> incorrectly for passing to free_area_init_node. Nothing is immediately jumping
>> out at me so can you boot up with a small debug patch?
>>
>> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
>> index 659c75d..88eac1f 100644
>> --- a/arch/arm/mm/init.c
>> +++ b/arch/arm/mm/init.c
>> @@ -187,6 +187,8 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low,
>>         unsigned long zone_size[MAX_NR_ZONES], zhole_size[MAX_NR_ZONES];
>>         struct memblock_region *reg;
>>  
>> +       pr_err("XXXXXXX min %lx max_low %lx max_high %lx\n", min, max_low, max_high);
>> +       __memblock_dump_all();
>>         /*
>>          * initialise the zones.
>>          */
>>
>> It would be helpful to do this across a few bootups to see if the values are
>> actually consistent. I'll keep looking in the meantime.
>>
>> Thanks,
>> Laura
>>
> 
> Thanks Laura for the pointer. In case of error, I am getting some random
> memblock_add() calls from drivers/of/fdt.c:early_init_dt_scan_memory.
> 
> The issue seems to be from u-boot, where it is not updating the memory
> subnode properly. I have got a fix for the u-boot, which I am testing
> right now. I will update tomorrow after I do some more test.
> 

I'm concerned my change can stay as is if this is exposing an issue
in u-boot. Asking people to change bootloaders rarely ends well. Can
you elaborate on what u-boot is doing that would be exposing this
issue?

Thanks,
Laura


-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 32+ messages in thread

* mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
@ 2014-06-25 21:57                       ` Laura Abbott
  0 siblings, 0 replies; 32+ messages in thread
From: Laura Abbott @ 2014-06-25 21:57 UTC (permalink / raw)
  To: linux-arm-kernel

On 6/25/2014 5:13 AM, Tushar Behera wrote:
> On 06/25/2014 03:59 AM, Laura Abbott wrote:
>> On 6/24/2014 10:47 AM, Laura Abbott wrote:
>>> On 6/23/2014 11:32 AM, Kevin Hilman wrote:
>>>> On Sun, Jun 22, 2014 at 8:56 PM, Tushar Behera <trblinux@gmail.com> wrote:
>>>>> Adding linux-samsung-soc and linux-arm-kernel ML for wider audience.
>>>>>
>>>>> On 06/19/2014 04:12 PM, Tushar Behera wrote:
>>>>>> On 06/19/2014 03:02 PM, Tushar Behera wrote:
>>>>>>> On 06/18/2014 09:22 AM, Kevin Hilman wrote:
>>>>>>>> On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera <trblinux@gmail.com> wrote:
>>>>>>>>> On 06/17/2014 10:23 PM, Kevin Hilman wrote:
>>>>>>>>>> Sachin,
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot <khilman@linaro.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Tree/Branch: mainline
>>>>>>>>>>> Git describe: v3.16-rc1-2-gebe0618
>>>>>>>>>>> Failed boot tests (console logs at the end)
>>>>>>>>>>> ===========================================
>>>>>>>>>>>      exynos5420-arndale-octa:     FAIL:    arm-exynos_defconfig
>>>>>>>>>>>                 ste-snowball:     FAIL:    arm-u8500_defconfig
>>>>>>>>>>
>>>>>>>>>> FYI... these failures are getting more consistent on my octa board,
>>>>>>>>>> but still not failing every time.
>>>>>>>>>>
>>>>>>>>>> Kevin
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Kevin,
>>>>>>>>>
>>>>>>>>> Same here.
>>>>>>>>>
>>>>>>>>> Observation: If you soft-reset the board (through the jumpers) after
>>>>>>>>> getting this problem, the problem keeps repeating. But if you hard-reset
>>>>>>>>> the board (by removing the power cord), the problem doesn't occur during
>>>>>>>>> next iteration.
>>>>>>>>
>>>>>>>> I don't ever use the soft-reset, I only toggle the wall power.  I
>>>>>>>> don't ever actually remove the power cord though, I'm using a
>>>>>>>> USB-controlled relay to toggle the wall power.
>>>>>>>>
>>>>>>>> Kevin
>>>>>>>>
>>>>>>>
>>>>>>> Laura,
>>>>>>>
>>>>>>> We are getting following kernel panic [1] (not always, but quite
>>>>>>> regularly) while booting Arndale-Octa (based on Samsung's Exynos5420)
>>>>>>> board with upstream kernel. I haven't observed this issue with other
>>>>>>> boards yet.
>>>>>>>
>>>>>>> This issue is observed when I am booting with uImage + dtb (within
>>>>>>> roughly ~10 iterations).
>>>>>>>
>>>>>>
>>>>>> Some more information:
>>>>>>
>>>>>> The boot logs are provided in pastebin, okay[2] and failed[3].
>>>>>>
>>>>>> In case of boot failures, I am getting a higher value for vm_total_pages
>>>>>> (684424 in [3]). In case of successful boot on my board, it is always
>>>>>> 521232 [2] on my board.
>>>>
>>>> I can confirm that reverting the "Get rid of meminfo" patch gets the
>>>> Octa board booting reliably again for me also.
>>>>
>>>> In case it helps, some boot logs for failures from the last copule
>>>> linux-next build/boot cycles can be seen here:
>>>> http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
>>>> http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
>>>>
>>>
>>> Sorry, I missed this yesterday. I'm going to take a look.
>>>
>>
>> Were all of 
>>
>> http://pastebin.com/1iLaizuL
>> http://pastebin.com/5tdDt4GL
>> http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
>> http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
>>
>> collected on the same type of board with the same amount of DRAM? I'm seeing a
>> different amount of total pages across all those logs. All the logs have the
>> same lowmem limit so it seems like the upper bound was being calculated
>> incorrectly for passing to free_area_init_node. Nothing is immediately jumping
>> out at me so can you boot up with a small debug patch?
>>
>> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
>> index 659c75d..88eac1f 100644
>> --- a/arch/arm/mm/init.c
>> +++ b/arch/arm/mm/init.c
>> @@ -187,6 +187,8 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low,
>>         unsigned long zone_size[MAX_NR_ZONES], zhole_size[MAX_NR_ZONES];
>>         struct memblock_region *reg;
>>  
>> +       pr_err("XXXXXXX min %lx max_low %lx max_high %lx\n", min, max_low, max_high);
>> +       __memblock_dump_all();
>>         /*
>>          * initialise the zones.
>>          */
>>
>> It would be helpful to do this across a few bootups to see if the values are
>> actually consistent. I'll keep looking in the meantime.
>>
>> Thanks,
>> Laura
>>
> 
> Thanks Laura for the pointer. In case of error, I am getting some random
> memblock_add() calls from drivers/of/fdt.c:early_init_dt_scan_memory.
> 
> The issue seems to be from u-boot, where it is not updating the memory
> subnode properly. I have got a fix for the u-boot, which I am testing
> right now. I will update tomorrow after I do some more test.
> 

I'm concerned my change can stay as is if this is exposing an issue
in u-boot. Asking people to change bootloaders rarely ends well. Can
you elaborate on what u-boot is doing that would be exposing this
issue?

Thanks,
Laura


-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
  2014-06-25 21:57                       ` Laura Abbott
@ 2014-06-26  6:44                         ` Tushar Behera
  -1 siblings, 0 replies; 32+ messages in thread
From: Tushar Behera @ 2014-06-26  6:44 UTC (permalink / raw)
  To: Laura Abbott, Kevin Hilman
  Cc: linux-samsung-soc, Russell King, kernel-build-reports,
	linaro-kernel, linux-arm-kernel

On 06/26/2014 03:27 AM, Laura Abbott wrote:
> On 6/25/2014 5:13 AM, Tushar Behera wrote:
>> On 06/25/2014 03:59 AM, Laura Abbott wrote:
>>> On 6/24/2014 10:47 AM, Laura Abbott wrote:
>>>> On 6/23/2014 11:32 AM, Kevin Hilman wrote:
>>>>> On Sun, Jun 22, 2014 at 8:56 PM, Tushar Behera <trblinux@gmail.com> wrote:
>>>>>> Adding linux-samsung-soc and linux-arm-kernel ML for wider audience.
>>>>>>
>>>>>> On 06/19/2014 04:12 PM, Tushar Behera wrote:
>>>>>>> On 06/19/2014 03:02 PM, Tushar Behera wrote:
>>>>>>>> On 06/18/2014 09:22 AM, Kevin Hilman wrote:
>>>>>>>>> On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera <trblinux@gmail.com> wrote:
>>>>>>>>>> On 06/17/2014 10:23 PM, Kevin Hilman wrote:
>>>>>>>>>>> Sachin,
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot <khilman@linaro.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Tree/Branch: mainline
>>>>>>>>>>>> Git describe: v3.16-rc1-2-gebe0618
>>>>>>>>>>>> Failed boot tests (console logs at the end)
>>>>>>>>>>>> ===========================================
>>>>>>>>>>>>      exynos5420-arndale-octa:     FAIL:    arm-exynos_defconfig
>>>>>>>>>>>>                 ste-snowball:     FAIL:    arm-u8500_defconfig
>>>>>>>>>>>
>>>>>>>>>>> FYI... these failures are getting more consistent on my octa board,
>>>>>>>>>>> but still not failing every time.
>>>>>>>>>>>
>>>>>>>>>>> Kevin
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Kevin,
>>>>>>>>>>
>>>>>>>>>> Same here.
>>>>>>>>>>
>>>>>>>>>> Observation: If you soft-reset the board (through the jumpers) after
>>>>>>>>>> getting this problem, the problem keeps repeating. But if you hard-reset
>>>>>>>>>> the board (by removing the power cord), the problem doesn't occur during
>>>>>>>>>> next iteration.
>>>>>>>>>
>>>>>>>>> I don't ever use the soft-reset, I only toggle the wall power.  I
>>>>>>>>> don't ever actually remove the power cord though, I'm using a
>>>>>>>>> USB-controlled relay to toggle the wall power.
>>>>>>>>>
>>>>>>>>> Kevin
>>>>>>>>>
>>>>>>>>
>>>>>>>> Laura,
>>>>>>>>
>>>>>>>> We are getting following kernel panic [1] (not always, but quite
>>>>>>>> regularly) while booting Arndale-Octa (based on Samsung's Exynos5420)
>>>>>>>> board with upstream kernel. I haven't observed this issue with other
>>>>>>>> boards yet.
>>>>>>>>
>>>>>>>> This issue is observed when I am booting with uImage + dtb (within
>>>>>>>> roughly ~10 iterations).
>>>>>>>>
>>>>>>>
>>>>>>> Some more information:
>>>>>>>
>>>>>>> The boot logs are provided in pastebin, okay[2] and failed[3].
>>>>>>>
>>>>>>> In case of boot failures, I am getting a higher value for vm_total_pages
>>>>>>> (684424 in [3]). In case of successful boot on my board, it is always
>>>>>>> 521232 [2] on my board.
>>>>>
>>>>> I can confirm that reverting the "Get rid of meminfo" patch gets the
>>>>> Octa board booting reliably again for me also.
>>>>>
>>>>> In case it helps, some boot logs for failures from the last copule
>>>>> linux-next build/boot cycles can be seen here:
>>>>> http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
>>>>> http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
>>>>>
>>>>
>>>> Sorry, I missed this yesterday. I'm going to take a look.
>>>>
>>>
>>> Were all of 
>>>
>>> http://pastebin.com/1iLaizuL
>>> http://pastebin.com/5tdDt4GL
>>> http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
>>> http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
>>>
>>> collected on the same type of board with the same amount of DRAM? I'm seeing a
>>> different amount of total pages across all those logs. All the logs have the
>>> same lowmem limit so it seems like the upper bound was being calculated
>>> incorrectly for passing to free_area_init_node. Nothing is immediately jumping
>>> out at me so can you boot up with a small debug patch?
>>>
>>> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
>>> index 659c75d..88eac1f 100644
>>> --- a/arch/arm/mm/init.c
>>> +++ b/arch/arm/mm/init.c
>>> @@ -187,6 +187,8 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low,
>>>         unsigned long zone_size[MAX_NR_ZONES], zhole_size[MAX_NR_ZONES];
>>>         struct memblock_region *reg;
>>>  
>>> +       pr_err("XXXXXXX min %lx max_low %lx max_high %lx\n", min, max_low, max_high);
>>> +       __memblock_dump_all();
>>>         /*
>>>          * initialise the zones.
>>>          */
>>>
>>> It would be helpful to do this across a few bootups to see if the values are
>>> actually consistent. I'll keep looking in the meantime.
>>>
>>> Thanks,
>>> Laura
>>>
>>
>> Thanks Laura for the pointer. In case of error, I am getting some random
>> memblock_add() calls from drivers/of/fdt.c:early_init_dt_scan_memory.
>>
>> The issue seems to be from u-boot, where it is not updating the memory
>> subnode properly. I have got a fix for the u-boot, which I am testing
>> right now. I will update tomorrow after I do some more test.
>>
> 
> I'm concerned my change can stay as is if this is exposing an issue
> in u-boot. Asking people to change bootloaders rarely ends well. Can
> you elaborate on what u-boot is doing that would be exposing this
> issue?
> 
> Thanks,
> Laura
> 
> 

Laura,

Here is my assessment of the current situation.

*Bug in the u-boot*
Current u-boot for Arndale-octa board has defined NR_BANKS as 12 and the
core uses a global structure (gd->bd) to maintain the start and size of
individual banks. Depending on the revision of SoC used on the board,
the board file [1] updates the start/size for either 8 or 12 banks. In
case of current revision of Arndale-Octa boards, the board file always
updates start/size for 8 banks, leaving the start/size data for
remaining 4 banks uninitialized.

But the u-boot core[2] updates the value of all the 12 banks, thus
potentially updating invalid data for last 4 banks.

The issue can be fixed by resetting the start/size for unused memory
banks to 0/0.[3]

*Before migration to memblock*
The path for adding DRAM banks was done through [4]. For Exynos systems,
NR_BANKS was defined as 8. The initial check for rejecting any banks
beyond NR_BANKS was good enough for fixing this issue. The bootlog[5]
(with some debug messages) shows the invalid data, both in u-boot and
kernel. Please grep for "NR_BANKS too low, ignoring memory" in the bootlog.

*After migration to memblock*
Now that the memory banks are added through [6], all the memory banks
are getting updated unconditionally resulting in the panic.

IMO, the bug is in u-boot and we should fix that.

[1]
https://github.com/tusharbehera/u-boot/blob/tracking-arndale-octa-v2012.07/board/samsung/smdk5420/smdk5420.c#L158
[2]
https://github.com/tusharbehera/u-boot/blob/tracking-arndale-octa-v2012.07/arch/arm/lib/bootm.c#L80
[3]
https://github.com/tusharbehera/u-boot/commit/9be794e886603a80f2c8686a75187ae67ac2158d
[4]
https://github.com/tusharbehera/linux/blob/v3.15-rc1/arch/arm/kernel/setup.c#L629
[5] http://pastebin.com/vLP2oG1mP
[6]
https://github.com/tusharbehera/linux/blob/v3.16-rc1/drivers/of/fdt.c#L878


-- 
Tushar Behera

^ permalink raw reply	[flat|nested] 32+ messages in thread

* mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
@ 2014-06-26  6:44                         ` Tushar Behera
  0 siblings, 0 replies; 32+ messages in thread
From: Tushar Behera @ 2014-06-26  6:44 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/26/2014 03:27 AM, Laura Abbott wrote:
> On 6/25/2014 5:13 AM, Tushar Behera wrote:
>> On 06/25/2014 03:59 AM, Laura Abbott wrote:
>>> On 6/24/2014 10:47 AM, Laura Abbott wrote:
>>>> On 6/23/2014 11:32 AM, Kevin Hilman wrote:
>>>>> On Sun, Jun 22, 2014 at 8:56 PM, Tushar Behera <trblinux@gmail.com> wrote:
>>>>>> Adding linux-samsung-soc and linux-arm-kernel ML for wider audience.
>>>>>>
>>>>>> On 06/19/2014 04:12 PM, Tushar Behera wrote:
>>>>>>> On 06/19/2014 03:02 PM, Tushar Behera wrote:
>>>>>>>> On 06/18/2014 09:22 AM, Kevin Hilman wrote:
>>>>>>>>> On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera <trblinux@gmail.com> wrote:
>>>>>>>>>> On 06/17/2014 10:23 PM, Kevin Hilman wrote:
>>>>>>>>>>> Sachin,
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot <khilman@linaro.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Tree/Branch: mainline
>>>>>>>>>>>> Git describe: v3.16-rc1-2-gebe0618
>>>>>>>>>>>> Failed boot tests (console logs at the end)
>>>>>>>>>>>> ===========================================
>>>>>>>>>>>>      exynos5420-arndale-octa:     FAIL:    arm-exynos_defconfig
>>>>>>>>>>>>                 ste-snowball:     FAIL:    arm-u8500_defconfig
>>>>>>>>>>>
>>>>>>>>>>> FYI... these failures are getting more consistent on my octa board,
>>>>>>>>>>> but still not failing every time.
>>>>>>>>>>>
>>>>>>>>>>> Kevin
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Kevin,
>>>>>>>>>>
>>>>>>>>>> Same here.
>>>>>>>>>>
>>>>>>>>>> Observation: If you soft-reset the board (through the jumpers) after
>>>>>>>>>> getting this problem, the problem keeps repeating. But if you hard-reset
>>>>>>>>>> the board (by removing the power cord), the problem doesn't occur during
>>>>>>>>>> next iteration.
>>>>>>>>>
>>>>>>>>> I don't ever use the soft-reset, I only toggle the wall power.  I
>>>>>>>>> don't ever actually remove the power cord though, I'm using a
>>>>>>>>> USB-controlled relay to toggle the wall power.
>>>>>>>>>
>>>>>>>>> Kevin
>>>>>>>>>
>>>>>>>>
>>>>>>>> Laura,
>>>>>>>>
>>>>>>>> We are getting following kernel panic [1] (not always, but quite
>>>>>>>> regularly) while booting Arndale-Octa (based on Samsung's Exynos5420)
>>>>>>>> board with upstream kernel. I haven't observed this issue with other
>>>>>>>> boards yet.
>>>>>>>>
>>>>>>>> This issue is observed when I am booting with uImage + dtb (within
>>>>>>>> roughly ~10 iterations).
>>>>>>>>
>>>>>>>
>>>>>>> Some more information:
>>>>>>>
>>>>>>> The boot logs are provided in pastebin, okay[2] and failed[3].
>>>>>>>
>>>>>>> In case of boot failures, I am getting a higher value for vm_total_pages
>>>>>>> (684424 in [3]). In case of successful boot on my board, it is always
>>>>>>> 521232 [2] on my board.
>>>>>
>>>>> I can confirm that reverting the "Get rid of meminfo" patch gets the
>>>>> Octa board booting reliably again for me also.
>>>>>
>>>>> In case it helps, some boot logs for failures from the last copule
>>>>> linux-next build/boot cycles can be seen here:
>>>>> http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
>>>>> http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
>>>>>
>>>>
>>>> Sorry, I missed this yesterday. I'm going to take a look.
>>>>
>>>
>>> Were all of 
>>>
>>> http://pastebin.com/1iLaizuL
>>> http://pastebin.com/5tdDt4GL
>>> http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
>>> http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log
>>>
>>> collected on the same type of board with the same amount of DRAM? I'm seeing a
>>> different amount of total pages across all those logs. All the logs have the
>>> same lowmem limit so it seems like the upper bound was being calculated
>>> incorrectly for passing to free_area_init_node. Nothing is immediately jumping
>>> out at me so can you boot up with a small debug patch?
>>>
>>> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
>>> index 659c75d..88eac1f 100644
>>> --- a/arch/arm/mm/init.c
>>> +++ b/arch/arm/mm/init.c
>>> @@ -187,6 +187,8 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low,
>>>         unsigned long zone_size[MAX_NR_ZONES], zhole_size[MAX_NR_ZONES];
>>>         struct memblock_region *reg;
>>>  
>>> +       pr_err("XXXXXXX min %lx max_low %lx max_high %lx\n", min, max_low, max_high);
>>> +       __memblock_dump_all();
>>>         /*
>>>          * initialise the zones.
>>>          */
>>>
>>> It would be helpful to do this across a few bootups to see if the values are
>>> actually consistent. I'll keep looking in the meantime.
>>>
>>> Thanks,
>>> Laura
>>>
>>
>> Thanks Laura for the pointer. In case of error, I am getting some random
>> memblock_add() calls from drivers/of/fdt.c:early_init_dt_scan_memory.
>>
>> The issue seems to be from u-boot, where it is not updating the memory
>> subnode properly. I have got a fix for the u-boot, which I am testing
>> right now. I will update tomorrow after I do some more test.
>>
> 
> I'm concerned my change can stay as is if this is exposing an issue
> in u-boot. Asking people to change bootloaders rarely ends well. Can
> you elaborate on what u-boot is doing that would be exposing this
> issue?
> 
> Thanks,
> Laura
> 
> 

Laura,

Here is my assessment of the current situation.

*Bug in the u-boot*
Current u-boot for Arndale-octa board has defined NR_BANKS as 12 and the
core uses a global structure (gd->bd) to maintain the start and size of
individual banks. Depending on the revision of SoC used on the board,
the board file [1] updates the start/size for either 8 or 12 banks. In
case of current revision of Arndale-Octa boards, the board file always
updates start/size for 8 banks, leaving the start/size data for
remaining 4 banks uninitialized.

But the u-boot core[2] updates the value of all the 12 banks, thus
potentially updating invalid data for last 4 banks.

The issue can be fixed by resetting the start/size for unused memory
banks to 0/0.[3]

*Before migration to memblock*
The path for adding DRAM banks was done through [4]. For Exynos systems,
NR_BANKS was defined as 8. The initial check for rejecting any banks
beyond NR_BANKS was good enough for fixing this issue. The bootlog[5]
(with some debug messages) shows the invalid data, both in u-boot and
kernel. Please grep for "NR_BANKS too low, ignoring memory" in the bootlog.

*After migration to memblock*
Now that the memory banks are added through [6], all the memory banks
are getting updated unconditionally resulting in the panic.

IMO, the bug is in u-boot and we should fix that.

[1]
https://github.com/tusharbehera/u-boot/blob/tracking-arndale-octa-v2012.07/board/samsung/smdk5420/smdk5420.c#L158
[2]
https://github.com/tusharbehera/u-boot/blob/tracking-arndale-octa-v2012.07/arch/arm/lib/bootm.c#L80
[3]
https://github.com/tusharbehera/u-boot/commit/9be794e886603a80f2c8686a75187ae67ac2158d
[4]
https://github.com/tusharbehera/linux/blob/v3.15-rc1/arch/arm/kernel/setup.c#L629
[5] http://pastebin.com/vLP2oG1mP
[6]
https://github.com/tusharbehera/linux/blob/v3.16-rc1/drivers/of/fdt.c#L878


-- 
Tushar Behera

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
  2014-06-26  6:44                         ` Tushar Behera
@ 2014-06-26 14:59                           ` Kevin Hilman
  -1 siblings, 0 replies; 32+ messages in thread
From: Kevin Hilman @ 2014-06-26 14:59 UTC (permalink / raw)
  To: Tushar Behera
  Cc: Laura Abbott, linux-samsung-soc, Russell King,
	kernel-build-reports, linaro-kernel, linux-arm-kernel

Hi Tushar,

> Here is my assessment of the current situation.

Thanks for digging into this and the detailed diagnosis.

> *Bug in the u-boot*
> Current u-boot for Arndale-octa board has defined NR_BANKS as 12 and the
> core uses a global structure (gd->bd) to maintain the start and size of
> individual banks. Depending on the revision of SoC used on the board,
> the board file [1] updates the start/size for either 8 or 12 banks. In
> case of current revision of Arndale-Octa boards, the board file always
> updates start/size for 8 banks, leaving the start/size data for
> remaining 4 banks uninitialized.
>
> But the u-boot core[2] updates the value of all the 12 banks, thus
> potentially updating invalid data for last 4 banks.
>
> The issue can be fixed by resetting the start/size for unused memory
> banks to 0/0.[3]
>
> *Before migration to memblock*
> The path for adding DRAM banks was done through [4]. For Exynos systems,
> NR_BANKS was defined as 8. The initial check for rejecting any banks
> beyond NR_BANKS was good enough for fixing this issue. The bootlog[5]
> (with some debug messages) shows the invalid data, both in u-boot and
> kernel. Please grep for "NR_BANKS too low, ignoring memory" in the bootlog.
>
> *After migration to memblock*
> Now that the memory banks are added through [6], all the memory banks
> are getting updated unconditionally resulting in the panic.
>
> IMO, the bug is in u-boot and we should fix that.

I agree that the u-boot bug needs to be fixed, and FWIW, I updated my
u-boot and haven't seen the boot failure yet after several boots with
next-20140625.

That being said, since it's not always feasible/practical to update
u-boot, and when it comes down to it, this is still a kernel
regression, we should also fix the kernel to sanity check the values
coming from u-boot, like it was doing before.

Could you (or Laura) come up with a way to recreate the sanity check
that was detecting this problem before and ignoring those banks?

Thanks,

Kevin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
@ 2014-06-26 14:59                           ` Kevin Hilman
  0 siblings, 0 replies; 32+ messages in thread
From: Kevin Hilman @ 2014-06-26 14:59 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Tushar,

> Here is my assessment of the current situation.

Thanks for digging into this and the detailed diagnosis.

> *Bug in the u-boot*
> Current u-boot for Arndale-octa board has defined NR_BANKS as 12 and the
> core uses a global structure (gd->bd) to maintain the start and size of
> individual banks. Depending on the revision of SoC used on the board,
> the board file [1] updates the start/size for either 8 or 12 banks. In
> case of current revision of Arndale-Octa boards, the board file always
> updates start/size for 8 banks, leaving the start/size data for
> remaining 4 banks uninitialized.
>
> But the u-boot core[2] updates the value of all the 12 banks, thus
> potentially updating invalid data for last 4 banks.
>
> The issue can be fixed by resetting the start/size for unused memory
> banks to 0/0.[3]
>
> *Before migration to memblock*
> The path for adding DRAM banks was done through [4]. For Exynos systems,
> NR_BANKS was defined as 8. The initial check for rejecting any banks
> beyond NR_BANKS was good enough for fixing this issue. The bootlog[5]
> (with some debug messages) shows the invalid data, both in u-boot and
> kernel. Please grep for "NR_BANKS too low, ignoring memory" in the bootlog.
>
> *After migration to memblock*
> Now that the memory banks are added through [6], all the memory banks
> are getting updated unconditionally resulting in the panic.
>
> IMO, the bug is in u-boot and we should fix that.

I agree that the u-boot bug needs to be fixed, and FWIW, I updated my
u-boot and haven't seen the boot failure yet after several boots with
next-20140625.

That being said, since it's not always feasible/practical to update
u-boot, and when it comes down to it, this is still a kernel
regression, we should also fix the kernel to sanity check the values
coming from u-boot, like it was doing before.

Could you (or Laura) come up with a way to recreate the sanity check
that was detecting this problem before and ignoring those banks?

Thanks,

Kevin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
  2014-06-26 14:59                           ` Kevin Hilman
@ 2014-06-26 15:17                             ` Russell King - ARM Linux
  -1 siblings, 0 replies; 32+ messages in thread
From: Russell King - ARM Linux @ 2014-06-26 15:17 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: Tushar Behera, Laura Abbott, linux-samsung-soc,
	kernel-build-reports, linaro-kernel, linux-arm-kernel

On Thu, Jun 26, 2014 at 07:59:19AM -0700, Kevin Hilman wrote:
> I agree that the u-boot bug needs to be fixed, and FWIW, I updated my
> u-boot and haven't seen the boot failure yet after several boots with
> next-20140625.
> 
> That being said, since it's not always feasible/practical to update
> u-boot, and when it comes down to it, this is still a kernel
> regression, we should also fix the kernel to sanity check the values
> coming from u-boot, like it was doing before.

It wasn't sanity checking the values (there is some sanity checking,
but the sanity checking doesn't catch this).

What caught it was that the kernel was configured to only look at the
first 8 of the 12 meminfo entries with ATAGs.  Since we no longer have
that limit, all meminfo entries are now looked at (since the kernel
doesn't need the limit.)

We could add back a soft-limit on the number of meminfo entries, but
this has to be platform specific.  Another entry to go into the
mach_info structures?

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
@ 2014-06-26 15:17                             ` Russell King - ARM Linux
  0 siblings, 0 replies; 32+ messages in thread
From: Russell King - ARM Linux @ 2014-06-26 15:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 26, 2014 at 07:59:19AM -0700, Kevin Hilman wrote:
> I agree that the u-boot bug needs to be fixed, and FWIW, I updated my
> u-boot and haven't seen the boot failure yet after several boots with
> next-20140625.
> 
> That being said, since it's not always feasible/practical to update
> u-boot, and when it comes down to it, this is still a kernel
> regression, we should also fix the kernel to sanity check the values
> coming from u-boot, like it was doing before.

It wasn't sanity checking the values (there is some sanity checking,
but the sanity checking doesn't catch this).

What caught it was that the kernel was configured to only look at the
first 8 of the 12 meminfo entries with ATAGs.  Since we no longer have
that limit, all meminfo entries are now looked at (since the kernel
doesn't need the limit.)

We could add back a soft-limit on the number of meminfo entries, but
this has to be platform specific.  Another entry to go into the
mach_info structures?

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
  2014-06-26 14:59                           ` Kevin Hilman
@ 2014-06-26 17:04                             ` Andreas Färber
  -1 siblings, 0 replies; 32+ messages in thread
From: Andreas Färber @ 2014-06-26 17:04 UTC (permalink / raw)
  To: Kevin Hilman, Tushar Behera
  Cc: Laura Abbott, linux-samsung-soc, Russell King,
	kernel-build-reports, linaro-kernel, linux-arm-kernel

Hi Kevin and Tushar,

Am 26.06.2014 16:59, schrieb Kevin Hilman:
>> IMO, the bug is in u-boot and we should fix that.
> 
> I agree that the u-boot bug needs to be fixed, and FWIW, I updated my
> u-boot and haven't seen the boot failure yet after several boots with
> next-20140625.

Could you clarify your test setup: Are you using the original InSignal
SPL [1] with just your own u-boot.bin? Or do you have access to some
newer Samsung-signed SPL?

> That being said, since it's not always feasible/practical to update
> u-boot, and when it comes down to it, this is still a kernel
> regression, we should also fix the kernel to sanity check the values
> coming from u-boot, like it was doing before.

Sounds good.

Apart from this memory issue here, I noticed that CPUs don't appear to
be in HYP mode for virtualization, which had required a signed SPL
update for the ODROID-XU [2]. And to me it looks as if there's no
Arndale Octa support in upstream U-Boot [3], no real maintenance on the
InSignal fork [4] and a policy of not cooperating with others [5].

Thanks,
Andreas

[1] http://forum.insignal.co.kr/viewtopic.php?f=6&t=3199
[2] http://forum.odroid.com/viewtopic.php?f=64&t=2778&start=40#p32581
[3]
http://git.denx.de/?p=u-boot.git;a=blob;f=boards.cfg;h=947f2bc5ba2794c94b3b2cea04664f005e025f9f;hb=HEAD#l286
[4] http://git.insignal.co.kr/insignal/arndale_octa-jb_mr1.1/u-boot/
[5] http://forum.insignal.co.kr/viewtopic.php?f=40&t=3613

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 32+ messages in thread

* mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
@ 2014-06-26 17:04                             ` Andreas Färber
  0 siblings, 0 replies; 32+ messages in thread
From: Andreas Färber @ 2014-06-26 17:04 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Kevin and Tushar,

Am 26.06.2014 16:59, schrieb Kevin Hilman:
>> IMO, the bug is in u-boot and we should fix that.
> 
> I agree that the u-boot bug needs to be fixed, and FWIW, I updated my
> u-boot and haven't seen the boot failure yet after several boots with
> next-20140625.

Could you clarify your test setup: Are you using the original InSignal
SPL [1] with just your own u-boot.bin? Or do you have access to some
newer Samsung-signed SPL?

> That being said, since it's not always feasible/practical to update
> u-boot, and when it comes down to it, this is still a kernel
> regression, we should also fix the kernel to sanity check the values
> coming from u-boot, like it was doing before.

Sounds good.

Apart from this memory issue here, I noticed that CPUs don't appear to
be in HYP mode for virtualization, which had required a signed SPL
update for the ODROID-XU [2]. And to me it looks as if there's no
Arndale Octa support in upstream U-Boot [3], no real maintenance on the
InSignal fork [4] and a policy of not cooperating with others [5].

Thanks,
Andreas

[1] http://forum.insignal.co.kr/viewtopic.php?f=6&t=3199
[2] http://forum.odroid.com/viewtopic.php?f=64&t=2778&start=40#p32581
[3]
http://git.denx.de/?p=u-boot.git;a=blob;f=boards.cfg;h=947f2bc5ba2794c94b3b2cea04664f005e025f9f;hb=HEAD#l286
[4] http://git.insignal.co.kr/insignal/arndale_octa-jb_mr1.1/u-boot/
[5] http://forum.insignal.co.kr/viewtopic.php?f=40&t=3613

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N?rnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imend?rffer; HRB 16746 AG N?rnberg

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
  2014-06-26 15:17                             ` Russell King - ARM Linux
@ 2014-06-26 19:42                               ` Laura Abbott
  -1 siblings, 0 replies; 32+ messages in thread
From: Laura Abbott @ 2014-06-26 19:42 UTC (permalink / raw)
  To: Russell King - ARM Linux, Kevin Hilman
  Cc: linux-samsung-soc, kernel-build-reports, Tushar Behera,
	linaro-kernel, linux-arm-kernel

On 6/26/2014 8:17 AM, Russell King - ARM Linux wrote:
> On Thu, Jun 26, 2014 at 07:59:19AM -0700, Kevin Hilman wrote:
>> I agree that the u-boot bug needs to be fixed, and FWIW, I updated my
>> u-boot and haven't seen the boot failure yet after several boots with
>> next-20140625.
>>
>> That being said, since it's not always feasible/practical to update
>> u-boot, and when it comes down to it, this is still a kernel
>> regression, we should also fix the kernel to sanity check the values
>> coming from u-boot, like it was doing before.
> 
> It wasn't sanity checking the values (there is some sanity checking,
> but the sanity checking doesn't catch this).
> 
> What caught it was that the kernel was configured to only look at the
> first 8 of the 12 meminfo entries with ATAGs.  Since we no longer have
> that limit, all meminfo entries are now looked at (since the kernel
> doesn't need the limit.)
> 
> We could add back a soft-limit on the number of meminfo entries, but
> this has to be platform specific.  Another entry to go into the
> mach_info structures?
> 

This is the least bad option I've come up with. It brings back
early_init_dt_add_memory_arch so we can use arm_add_memory and stop
adding memory if it reaches an upper threshold. I was debating setting
the default at 12 or 8 but setting at 12 seems like it would involve the
fewest platform changes.

Thanks,
Laura

----8<----
>From 1a5265fd178fea0da432fa9d49ce28e78bd25e04 Mon Sep 17 00:00:00 2001
From: Laura Abbott <lauraa@codeaurora.org>
Date: Thu, 26 Jun 2014 11:23:44 -0700
Subject: [PATCH] arm: Add back maximum bank limit

Commit 1c2f87c22566cd057bc8cde10c37ae9da1a1bb76
(ARM: 8025/1: Get rid of meminfo) dropped the upper bound on
the number of memory banks that can be added as there was no
technical need in the kernel. It turns out though, some bootloaders
(specifically the arndale-octa exynos boards) may pass invalid memory
information and rely on the kernel to not parse this data. This is a
bug in the bootloader but we still need to work around this.
Re-introduce a maximum bank limit per board to prevent invalid banks
from being passed to the kernel.

Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
---
 arch/arm/include/asm/mach/arch.h |  8 ++++++--
 arch/arm/kernel/devtree.c        |  4 ++++
 arch/arm/kernel/setup.c          | 16 ++++++++++++++++
 arch/arm/mach-exynos/exynos.c    |  1 +
 4 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/mach/arch.h b/arch/arm/include/asm/mach/arch.h
index 060a75e..2a436ac 100644
--- a/arch/arm/include/asm/mach/arch.h
+++ b/arch/arm/include/asm/mach/arch.h
@@ -40,6 +40,8 @@ struct machine_desc {
 	unsigned int		video_start;	/* start of video RAM	*/
 	unsigned int		video_end;	/* end of video RAM	*/
 
+	unsigned int		bank_limit;	/* maximum number of memory
+						 * banks to add */
 	unsigned char		reserve_lp0 :1;	/* never has lp0	*/
 	unsigned char		reserve_lp1 :1;	/* never has lp1	*/
 	unsigned char		reserve_lp2 :1;	/* never has lp2	*/
@@ -85,7 +87,8 @@ static const struct machine_desc __mach_desc_##_type	\
  __used							\
  __attribute__((__section__(".arch.info.init"))) = {	\
 	.nr		= MACH_TYPE_##_type,		\
-	.name		= _name,
+	.name		= _name,			\
+	.bank_limit	= 12,
 
 #define MACHINE_END				\
 };
@@ -95,6 +98,7 @@ static const struct machine_desc __mach_desc_##_name	\
  __used							\
  __attribute__((__section__(".arch.info.init"))) = {	\
 	.nr		= ~0,				\
-	.name		= _namestr,
+	.name		= _namestr,			\
+	.bank_limit	= 12,
 
 #endif
diff --git a/arch/arm/kernel/devtree.c b/arch/arm/kernel/devtree.c
index e94a157..ea9ce92 100644
--- a/arch/arm/kernel/devtree.c
+++ b/arch/arm/kernel/devtree.c
@@ -27,6 +27,10 @@
 #include <asm/mach/arch.h>
 #include <asm/mach-types.h>
 
+void __init early_init_dt_add_memory_arch(u64 base, u64 size)
+{
+	arm_add_memory(base, size);
+}
 
 #ifdef CONFIG_SMP
 extern struct of_cpu_method __cpu_method_of_table[];
diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 8a16ee5..3ab94d1 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -629,11 +629,26 @@ void __init dump_machine_table(void)
 		/* can't use cpu_relax() here as it may require MMU setup */;
 }
 
+static unsigned int bank_cnt;
+static unsigned int max_cnt;
+
 int __init arm_add_memory(u64 start, u64 size)
 {
 	u64 aligned_start;
 
 	/*
+	 * Some buggy bootloaders rely on the old meminfo behavior of not adding
+	 * more than n banks since anything past that may contain invalid data.
+	 */
+	if (bank_cnt >= max_cnt) {
+		pr_crit("Max banks too low, ignoring memory at 0x%08llx\n",
+			(long long)start);
+		return -EINVAL;
+	}
+
+	bank_cnt++;
+
+	/*
 	 * Ensure that start/size are aligned to a page boundary.
 	 * Size is appropriately rounded down, start is rounded up.
 	 */
@@ -879,6 +894,7 @@ void __init setup_arch(char **cmdline_p)
 		mdesc = setup_machine_tags(__atags_pointer, __machine_arch_type);
 	machine_desc = mdesc;
 	machine_name = mdesc->name;
+	max_cnt = mdesc->bank_limit;
 
 	if (mdesc->reboot_mode != REBOOT_HARD)
 		reboot_mode = mdesc->reboot_mode;
diff --git a/arch/arm/mach-exynos/exynos.c b/arch/arm/mach-exynos/exynos.c
index f38cf7c..91283fd 100644
--- a/arch/arm/mach-exynos/exynos.c
+++ b/arch/arm/mach-exynos/exynos.c
@@ -350,4 +350,5 @@ DT_MACHINE_START(EXYNOS_DT, "SAMSUNG EXYNOS (Flattened Device Tree)")
 	.dt_compat	= exynos_dt_compat,
 	.restart	= exynos_restart,
 	.reserve	= exynos_reserve,
+	.bank_limit     = 8,
 MACHINE_END
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
@ 2014-06-26 19:42                               ` Laura Abbott
  0 siblings, 0 replies; 32+ messages in thread
From: Laura Abbott @ 2014-06-26 19:42 UTC (permalink / raw)
  To: linux-arm-kernel

On 6/26/2014 8:17 AM, Russell King - ARM Linux wrote:
> On Thu, Jun 26, 2014 at 07:59:19AM -0700, Kevin Hilman wrote:
>> I agree that the u-boot bug needs to be fixed, and FWIW, I updated my
>> u-boot and haven't seen the boot failure yet after several boots with
>> next-20140625.
>>
>> That being said, since it's not always feasible/practical to update
>> u-boot, and when it comes down to it, this is still a kernel
>> regression, we should also fix the kernel to sanity check the values
>> coming from u-boot, like it was doing before.
> 
> It wasn't sanity checking the values (there is some sanity checking,
> but the sanity checking doesn't catch this).
> 
> What caught it was that the kernel was configured to only look at the
> first 8 of the 12 meminfo entries with ATAGs.  Since we no longer have
> that limit, all meminfo entries are now looked at (since the kernel
> doesn't need the limit.)
> 
> We could add back a soft-limit on the number of meminfo entries, but
> this has to be platform specific.  Another entry to go into the
> mach_info structures?
> 

This is the least bad option I've come up with. It brings back
early_init_dt_add_memory_arch so we can use arm_add_memory and stop
adding memory if it reaches an upper threshold. I was debating setting
the default at 12 or 8 but setting at 12 seems like it would involve the
fewest platform changes.

Thanks,
Laura

----8<----
>From 1a5265fd178fea0da432fa9d49ce28e78bd25e04 Mon Sep 17 00:00:00 2001
From: Laura Abbott <lauraa@codeaurora.org>
Date: Thu, 26 Jun 2014 11:23:44 -0700
Subject: [PATCH] arm: Add back maximum bank limit

Commit 1c2f87c22566cd057bc8cde10c37ae9da1a1bb76
(ARM: 8025/1: Get rid of meminfo) dropped the upper bound on
the number of memory banks that can be added as there was no
technical need in the kernel. It turns out though, some bootloaders
(specifically the arndale-octa exynos boards) may pass invalid memory
information and rely on the kernel to not parse this data. This is a
bug in the bootloader but we still need to work around this.
Re-introduce a maximum bank limit per board to prevent invalid banks
from being passed to the kernel.

Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
---
 arch/arm/include/asm/mach/arch.h |  8 ++++++--
 arch/arm/kernel/devtree.c        |  4 ++++
 arch/arm/kernel/setup.c          | 16 ++++++++++++++++
 arch/arm/mach-exynos/exynos.c    |  1 +
 4 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/mach/arch.h b/arch/arm/include/asm/mach/arch.h
index 060a75e..2a436ac 100644
--- a/arch/arm/include/asm/mach/arch.h
+++ b/arch/arm/include/asm/mach/arch.h
@@ -40,6 +40,8 @@ struct machine_desc {
 	unsigned int		video_start;	/* start of video RAM	*/
 	unsigned int		video_end;	/* end of video RAM	*/
 
+	unsigned int		bank_limit;	/* maximum number of memory
+						 * banks to add */
 	unsigned char		reserve_lp0 :1;	/* never has lp0	*/
 	unsigned char		reserve_lp1 :1;	/* never has lp1	*/
 	unsigned char		reserve_lp2 :1;	/* never has lp2	*/
@@ -85,7 +87,8 @@ static const struct machine_desc __mach_desc_##_type	\
  __used							\
  __attribute__((__section__(".arch.info.init"))) = {	\
 	.nr		= MACH_TYPE_##_type,		\
-	.name		= _name,
+	.name		= _name,			\
+	.bank_limit	= 12,
 
 #define MACHINE_END				\
 };
@@ -95,6 +98,7 @@ static const struct machine_desc __mach_desc_##_name	\
  __used							\
  __attribute__((__section__(".arch.info.init"))) = {	\
 	.nr		= ~0,				\
-	.name		= _namestr,
+	.name		= _namestr,			\
+	.bank_limit	= 12,
 
 #endif
diff --git a/arch/arm/kernel/devtree.c b/arch/arm/kernel/devtree.c
index e94a157..ea9ce92 100644
--- a/arch/arm/kernel/devtree.c
+++ b/arch/arm/kernel/devtree.c
@@ -27,6 +27,10 @@
 #include <asm/mach/arch.h>
 #include <asm/mach-types.h>
 
+void __init early_init_dt_add_memory_arch(u64 base, u64 size)
+{
+	arm_add_memory(base, size);
+}
 
 #ifdef CONFIG_SMP
 extern struct of_cpu_method __cpu_method_of_table[];
diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 8a16ee5..3ab94d1 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -629,11 +629,26 @@ void __init dump_machine_table(void)
 		/* can't use cpu_relax() here as it may require MMU setup */;
 }
 
+static unsigned int bank_cnt;
+static unsigned int max_cnt;
+
 int __init arm_add_memory(u64 start, u64 size)
 {
 	u64 aligned_start;
 
 	/*
+	 * Some buggy bootloaders rely on the old meminfo behavior of not adding
+	 * more than n banks since anything past that may contain invalid data.
+	 */
+	if (bank_cnt >= max_cnt) {
+		pr_crit("Max banks too low, ignoring memory at 0x%08llx\n",
+			(long long)start);
+		return -EINVAL;
+	}
+
+	bank_cnt++;
+
+	/*
 	 * Ensure that start/size are aligned to a page boundary.
 	 * Size is appropriately rounded down, start is rounded up.
 	 */
@@ -879,6 +894,7 @@ void __init setup_arch(char **cmdline_p)
 		mdesc = setup_machine_tags(__atags_pointer, __machine_arch_type);
 	machine_desc = mdesc;
 	machine_name = mdesc->name;
+	max_cnt = mdesc->bank_limit;
 
 	if (mdesc->reboot_mode != REBOOT_HARD)
 		reboot_mode = mdesc->reboot_mode;
diff --git a/arch/arm/mach-exynos/exynos.c b/arch/arm/mach-exynos/exynos.c
index f38cf7c..91283fd 100644
--- a/arch/arm/mach-exynos/exynos.c
+++ b/arch/arm/mach-exynos/exynos.c
@@ -350,4 +350,5 @@ DT_MACHINE_START(EXYNOS_DT, "SAMSUNG EXYNOS (Flattened Device Tree)")
 	.dt_compat	= exynos_dt_compat,
 	.restart	= exynos_restart,
 	.reserve	= exynos_reserve,
+	.bank_limit     = 8,
 MACHINE_END
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
  2014-06-26 19:42                               ` Laura Abbott
@ 2014-06-27  3:06                                 ` Tushar Behera
  -1 siblings, 0 replies; 32+ messages in thread
From: Tushar Behera @ 2014-06-27  3:06 UTC (permalink / raw)
  To: Laura Abbott, Russell King - ARM Linux, Kevin Hilman
  Cc: linux-samsung-soc, kernel-build-reports, linaro-kernel, linux-arm-kernel

On 06/27/2014 01:12 AM, Laura Abbott wrote:

>  
> +static unsigned int bank_cnt;
> +static unsigned int max_cnt;
> +
>  int __init arm_add_memory(u64 start, u64 size)
>  {
>  	u64 aligned_start;
>  
>  	/*
> +	 * Some buggy bootloaders rely on the old meminfo behavior of not adding
> +	 * more than n banks since anything past that may contain invalid data.
> +	 */
> +	if (bank_cnt >= max_cnt) {
> +		pr_crit("Max banks too low, ignoring memory at 0x%08llx\n",
> +			(long long)start);
> +		return -EINVAL;
> +	}
> +
> +	bank_cnt++;
> +
> +	/*
>  	 * Ensure that start/size are aligned to a page boundary.
>  	 * Size is appropriately rounded down, start is rounded up.
>  	 */
> @@ -879,6 +894,7 @@ void __init setup_arch(char **cmdline_p)
>  		mdesc = setup_machine_tags(__atags_pointer, __machine_arch_type);
>  	machine_desc = mdesc;
>  	machine_name = mdesc->name;
> +	max_cnt = mdesc->bank_limit;

arm_add_memory is getting called before this is being set, resulting in
none of the memory banks getting added[1].

setup_machine_fdt -> early_init_dt_scan -> early_init_dt_scan_memory

Would it make sense to re-introduce the config option ARM_NR_BANKS and
replace max_cnt with NR_BANKS?

[1] http://pastebin.com/MawYD7kb

>  
>  	if (mdesc->reboot_mode != REBOOT_HARD)
>  		reboot_mode = mdesc->reboot_mode;
> diff --git a/arch/arm/mach-exynos/exynos.c b/arch/arm/mach-exynos/exynos.c
> index f38cf7c..91283fd 100644
> --- a/arch/arm/mach-exynos/exynos.c
> +++ b/arch/arm/mach-exynos/exynos.c
> @@ -350,4 +350,5 @@ DT_MACHINE_START(EXYNOS_DT, "SAMSUNG EXYNOS (Flattened Device Tree)")
>  	.dt_compat	= exynos_dt_compat,
>  	.restart	= exynos_restart,
>  	.reserve	= exynos_reserve,
> +	.bank_limit     = 8,
>  MACHINE_END
> 


-- 
Tushar Behera

^ permalink raw reply	[flat|nested] 32+ messages in thread

* mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
@ 2014-06-27  3:06                                 ` Tushar Behera
  0 siblings, 0 replies; 32+ messages in thread
From: Tushar Behera @ 2014-06-27  3:06 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/27/2014 01:12 AM, Laura Abbott wrote:

>  
> +static unsigned int bank_cnt;
> +static unsigned int max_cnt;
> +
>  int __init arm_add_memory(u64 start, u64 size)
>  {
>  	u64 aligned_start;
>  
>  	/*
> +	 * Some buggy bootloaders rely on the old meminfo behavior of not adding
> +	 * more than n banks since anything past that may contain invalid data.
> +	 */
> +	if (bank_cnt >= max_cnt) {
> +		pr_crit("Max banks too low, ignoring memory at 0x%08llx\n",
> +			(long long)start);
> +		return -EINVAL;
> +	}
> +
> +	bank_cnt++;
> +
> +	/*
>  	 * Ensure that start/size are aligned to a page boundary.
>  	 * Size is appropriately rounded down, start is rounded up.
>  	 */
> @@ -879,6 +894,7 @@ void __init setup_arch(char **cmdline_p)
>  		mdesc = setup_machine_tags(__atags_pointer, __machine_arch_type);
>  	machine_desc = mdesc;
>  	machine_name = mdesc->name;
> +	max_cnt = mdesc->bank_limit;

arm_add_memory is getting called before this is being set, resulting in
none of the memory banks getting added[1].

setup_machine_fdt -> early_init_dt_scan -> early_init_dt_scan_memory

Would it make sense to re-introduce the config option ARM_NR_BANKS and
replace max_cnt with NR_BANKS?

[1] http://pastebin.com/MawYD7kb

>  
>  	if (mdesc->reboot_mode != REBOOT_HARD)
>  		reboot_mode = mdesc->reboot_mode;
> diff --git a/arch/arm/mach-exynos/exynos.c b/arch/arm/mach-exynos/exynos.c
> index f38cf7c..91283fd 100644
> --- a/arch/arm/mach-exynos/exynos.c
> +++ b/arch/arm/mach-exynos/exynos.c
> @@ -350,4 +350,5 @@ DT_MACHINE_START(EXYNOS_DT, "SAMSUNG EXYNOS (Flattened Device Tree)")
>  	.dt_compat	= exynos_dt_compat,
>  	.restart	= exynos_restart,
>  	.reserve	= exynos_reserve,
> +	.bank_limit     = 8,
>  MACHINE_END
> 


-- 
Tushar Behera

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
  2014-06-26 17:04                             ` Andreas Färber
@ 2014-06-27  3:28                               ` Tushar Behera
  -1 siblings, 0 replies; 32+ messages in thread
From: Tushar Behera @ 2014-06-27  3:28 UTC (permalink / raw)
  To: Andreas Färber, Kevin Hilman
  Cc: Laura Abbott, linux-samsung-soc, Russell King,
	kernel-build-reports, linaro-kernel, linux-arm-kernel

On 06/26/2014 10:34 PM, Andreas Färber wrote:
> Hi Kevin and Tushar,
> 
> Am 26.06.2014 16:59, schrieb Kevin Hilman:
>>> IMO, the bug is in u-boot and we should fix that.
>>
>> I agree that the u-boot bug needs to be fixed, and FWIW, I updated my
>> u-boot and haven't seen the boot failure yet after several boots with
>> next-20140625.
> 
> Could you clarify your test setup: Are you using the original InSignal
> SPL [1] with just your own u-boot.bin? Or do you have access to some
> newer Samsung-signed SPL?
> 

The u-boot changes for Arndale-Octa was done as part of an activity
within Linaro. Insignal had signed the SPL binary for us. You can
extract the signed SPL binary from following hwpack[6] (tar xfz and then
within u_boot folder[7]).

The source code for this u-boot can be found here.[8]

Just in case, commands to flash u-boot binaries are listed here.[9]

>> That being said, since it's not always feasible/practical to update
>> u-boot, and when it comes down to it, this is still a kernel
>> regression, we should also fix the kernel to sanity check the values
>> coming from u-boot, like it was doing before.
> 
> Sounds good.
> 
> Apart from this memory issue here, I noticed that CPUs don't appear to
> be in HYP mode for virtualization, which had required a signed SPL
> update for the ODROID-XU [2]. And to me it looks as if there's no
> Arndale Octa support in upstream U-Boot [3], no real maintenance on the
> InSignal fork [4] and a policy of not cooperating with others [5].
> 

Adding Arndale-Octa support to upstream U-Boot was on a TODO list, but
that didn't materialize because of some other reasons.

> Thanks,
> Andreas
> 
> [1] http://forum.insignal.co.kr/viewtopic.php?f=6&t=3199
> [2] http://forum.odroid.com/viewtopic.php?f=64&t=2778&start=40#p32581
> [3]
> http://git.denx.de/?p=u-boot.git;a=blob;f=boards.cfg;h=947f2bc5ba2794c94b3b2cea04664f005e025f9f;hb=HEAD#l286
> [4] http://git.insignal.co.kr/insignal/arndale_octa-jb_mr1.1/u-boot/
> [5] http://forum.insignal.co.kr/viewtopic.php?f=40&t=3613
> 
[6]
http://snapshots.linaro.org/kernel-hwpack/linux-linaro-tracking-ll-arndale-octa/442/hwpack_linaro-arndale-octa_20140626-442_armhf_supported.tar.gz
[7] <path_to_extracted_folder>/u_boot/usr/lib/u-boot/arndale_octa
[8]
git.linaro.org/landing-teams/working/samsung/u-boot.git/shortlog/refs/heads/tracking-arndale_octa
[9] http://pastebin.com/pfGF2giq

Thanks,
-- 
Tushar Behera

^ permalink raw reply	[flat|nested] 32+ messages in thread

* mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
@ 2014-06-27  3:28                               ` Tushar Behera
  0 siblings, 0 replies; 32+ messages in thread
From: Tushar Behera @ 2014-06-27  3:28 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/26/2014 10:34 PM, Andreas F?rber wrote:
> Hi Kevin and Tushar,
> 
> Am 26.06.2014 16:59, schrieb Kevin Hilman:
>>> IMO, the bug is in u-boot and we should fix that.
>>
>> I agree that the u-boot bug needs to be fixed, and FWIW, I updated my
>> u-boot and haven't seen the boot failure yet after several boots with
>> next-20140625.
> 
> Could you clarify your test setup: Are you using the original InSignal
> SPL [1] with just your own u-boot.bin? Or do you have access to some
> newer Samsung-signed SPL?
> 

The u-boot changes for Arndale-Octa was done as part of an activity
within Linaro. Insignal had signed the SPL binary for us. You can
extract the signed SPL binary from following hwpack[6] (tar xfz and then
within u_boot folder[7]).

The source code for this u-boot can be found here.[8]

Just in case, commands to flash u-boot binaries are listed here.[9]

>> That being said, since it's not always feasible/practical to update
>> u-boot, and when it comes down to it, this is still a kernel
>> regression, we should also fix the kernel to sanity check the values
>> coming from u-boot, like it was doing before.
> 
> Sounds good.
> 
> Apart from this memory issue here, I noticed that CPUs don't appear to
> be in HYP mode for virtualization, which had required a signed SPL
> update for the ODROID-XU [2]. And to me it looks as if there's no
> Arndale Octa support in upstream U-Boot [3], no real maintenance on the
> InSignal fork [4] and a policy of not cooperating with others [5].
> 

Adding Arndale-Octa support to upstream U-Boot was on a TODO list, but
that didn't materialize because of some other reasons.

> Thanks,
> Andreas
> 
> [1] http://forum.insignal.co.kr/viewtopic.php?f=6&t=3199
> [2] http://forum.odroid.com/viewtopic.php?f=64&t=2778&start=40#p32581
> [3]
> http://git.denx.de/?p=u-boot.git;a=blob;f=boards.cfg;h=947f2bc5ba2794c94b3b2cea04664f005e025f9f;hb=HEAD#l286
> [4] http://git.insignal.co.kr/insignal/arndale_octa-jb_mr1.1/u-boot/
> [5] http://forum.insignal.co.kr/viewtopic.php?f=40&t=3613
> 
[6]
http://snapshots.linaro.org/kernel-hwpack/linux-linaro-tracking-ll-arndale-octa/442/hwpack_linaro-arndale-octa_20140626-442_armhf_supported.tar.gz
[7] <path_to_extracted_folder>/u_boot/usr/lib/u-boot/arndale_octa
[8]
git.linaro.org/landing-teams/working/samsung/u-boot.git/shortlog/refs/heads/tracking-arndale_octa
[9] http://pastebin.com/pfGF2giq

Thanks,
-- 
Tushar Behera

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
  2014-06-27  3:06                                 ` Tushar Behera
@ 2014-06-27  9:09                                   ` Laura Abbott
  -1 siblings, 0 replies; 32+ messages in thread
From: Laura Abbott @ 2014-06-27  9:09 UTC (permalink / raw)
  To: Tushar Behera, Russell King - ARM Linux, Kevin Hilman
  Cc: linux-samsung-soc, linux-arm-kernel, linaro-kernel, kernel-build-reports

On 6/26/2014 8:06 PM, Tushar Behera wrote:
> On 06/27/2014 01:12 AM, Laura Abbott wrote:
>
>>
>> +static unsigned int bank_cnt;
>> +static unsigned int max_cnt;
>> +
>>   int __init arm_add_memory(u64 start, u64 size)
>>   {
>>   	u64 aligned_start;
>>
>>   	/*
>> +	 * Some buggy bootloaders rely on the old meminfo behavior of not adding
>> +	 * more than n banks since anything past that may contain invalid data.
>> +	 */
>> +	if (bank_cnt >= max_cnt) {
>> +		pr_crit("Max banks too low, ignoring memory at 0x%08llx\n",
>> +			(long long)start);
>> +		return -EINVAL;
>> +	}
>> +
>> +	bank_cnt++;
>> +
>> +	/*
>>   	 * Ensure that start/size are aligned to a page boundary.
>>   	 * Size is appropriately rounded down, start is rounded up.
>>   	 */
>> @@ -879,6 +894,7 @@ void __init setup_arch(char **cmdline_p)
>>   		mdesc = setup_machine_tags(__atags_pointer, __machine_arch_type);
>>   	machine_desc = mdesc;
>>   	machine_name = mdesc->name;
>> +	max_cnt = mdesc->bank_limit;
>
> arm_add_memory is getting called before this is being set, resulting in
> none of the memory banks getting added[1].
>
> setup_machine_fdt -> early_init_dt_scan -> early_init_dt_scan_memory
>
> Would it make sense to re-introduce the config option ARM_NR_BANKS and
> replace max_cnt with NR_BANKS?
>
> [1] http://pastebin.com/MawYD7kb
>

I was hoping to avoid re-introducing the config option but that may be
the case if we can't make the machine_info work. I'll take a better
look tomorrow.

Thanks,
Laura


-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 32+ messages in thread

* mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
@ 2014-06-27  9:09                                   ` Laura Abbott
  0 siblings, 0 replies; 32+ messages in thread
From: Laura Abbott @ 2014-06-27  9:09 UTC (permalink / raw)
  To: linux-arm-kernel

On 6/26/2014 8:06 PM, Tushar Behera wrote:
> On 06/27/2014 01:12 AM, Laura Abbott wrote:
>
>>
>> +static unsigned int bank_cnt;
>> +static unsigned int max_cnt;
>> +
>>   int __init arm_add_memory(u64 start, u64 size)
>>   {
>>   	u64 aligned_start;
>>
>>   	/*
>> +	 * Some buggy bootloaders rely on the old meminfo behavior of not adding
>> +	 * more than n banks since anything past that may contain invalid data.
>> +	 */
>> +	if (bank_cnt >= max_cnt) {
>> +		pr_crit("Max banks too low, ignoring memory at 0x%08llx\n",
>> +			(long long)start);
>> +		return -EINVAL;
>> +	}
>> +
>> +	bank_cnt++;
>> +
>> +	/*
>>   	 * Ensure that start/size are aligned to a page boundary.
>>   	 * Size is appropriately rounded down, start is rounded up.
>>   	 */
>> @@ -879,6 +894,7 @@ void __init setup_arch(char **cmdline_p)
>>   		mdesc = setup_machine_tags(__atags_pointer, __machine_arch_type);
>>   	machine_desc = mdesc;
>>   	machine_name = mdesc->name;
>> +	max_cnt = mdesc->bank_limit;
>
> arm_add_memory is getting called before this is being set, resulting in
> none of the memory banks getting added[1].
>
> setup_machine_fdt -> early_init_dt_scan -> early_init_dt_scan_memory
>
> Would it make sense to re-introduce the config option ARM_NR_BANKS and
> replace max_cnt with NR_BANKS?
>
> [1] http://pastebin.com/MawYD7kb
>

I was hoping to avoid re-introducing the config option but that may be
the case if we can't make the machine_info work. I'll take a better
look tomorrow.

Thanks,
Laura


-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
  2014-06-27  9:09                                   ` Laura Abbott
@ 2014-06-27  9:40                                     ` Russell King - ARM Linux
  -1 siblings, 0 replies; 32+ messages in thread
From: Russell King - ARM Linux @ 2014-06-27  9:40 UTC (permalink / raw)
  To: Laura Abbott
  Cc: Tushar Behera, Kevin Hilman, linux-samsung-soc, linux-arm-kernel,
	linaro-kernel, kernel-build-reports

On Fri, Jun 27, 2014 at 02:09:58AM -0700, Laura Abbott wrote:
> On 6/26/2014 8:06 PM, Tushar Behera wrote:
>> arm_add_memory is getting called before this is being set, resulting in
>> none of the memory banks getting added[1].
>>
>> setup_machine_fdt -> early_init_dt_scan -> early_init_dt_scan_memory
>>
>> Would it make sense to re-introduce the config option ARM_NR_BANKS and
>> replace max_cnt with NR_BANKS?
>>
>> [1] http://pastebin.com/MawYD7kb
>>
>
> I was hoping to avoid re-introducing the config option but that may be
> the case if we can't make the machine_info work. I'll take a better
> look tomorrow.

The problem with the config option is that it's not single zImage
friendly.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)
@ 2014-06-27  9:40                                     ` Russell King - ARM Linux
  0 siblings, 0 replies; 32+ messages in thread
From: Russell King - ARM Linux @ 2014-06-27  9:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jun 27, 2014 at 02:09:58AM -0700, Laura Abbott wrote:
> On 6/26/2014 8:06 PM, Tushar Behera wrote:
>> arm_add_memory is getting called before this is being set, resulting in
>> none of the memory banks getting added[1].
>>
>> setup_machine_fdt -> early_init_dt_scan -> early_init_dt_scan_memory
>>
>> Would it make sense to re-introduce the config option ARM_NR_BANKS and
>> replace max_cnt with NR_BANKS?
>>
>> [1] http://pastebin.com/MawYD7kb
>>
>
> I was hoping to avoid re-introducing the config option but that may be
> the case if we can't make the machine_info work. I'll take a better
> look tomorrow.

The problem with the config option is that it's not single zImage
friendly.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2014-06-27  9:40 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <539fdd37.e7bc420a.76b9.ffffb583@mx.google.com>
     [not found] ` <CAGa+x85nqPbH-sye28ni=gUEdgjGdUDqHhv1+0pV5aO8y1+wHQ@mail.gmail.com>
     [not found]   ` <53A106F1.10201@gmail.com>
     [not found]     ` <CAGa+x8527FEPk6fg5kv-fbOzK3MzcFooFGE9Me12b9C_Pv=UzA@mail.gmail.com>
     [not found]       ` <53A2AE11.2050208@gmail.com>
     [not found]         ` <53A2BE94.2010308@gmail.com>
2014-06-23  3:56           ` mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618) Tushar Behera
2014-06-23  3:56             ` Tushar Behera
2014-06-23 18:32             ` Kevin Hilman
2014-06-23 18:32               ` Kevin Hilman
2014-06-24 17:47               ` Laura Abbott
2014-06-24 17:47                 ` Laura Abbott
2014-06-24 22:29                 ` Laura Abbott
2014-06-24 22:29                   ` Laura Abbott
2014-06-25 12:13                   ` Tushar Behera
2014-06-25 12:13                     ` Tushar Behera
2014-06-25 21:57                     ` Laura Abbott
2014-06-25 21:57                       ` Laura Abbott
2014-06-26  6:44                       ` Tushar Behera
2014-06-26  6:44                         ` Tushar Behera
2014-06-26 14:59                         ` Kevin Hilman
2014-06-26 14:59                           ` Kevin Hilman
2014-06-26 15:17                           ` Russell King - ARM Linux
2014-06-26 15:17                             ` Russell King - ARM Linux
2014-06-26 19:42                             ` Laura Abbott
2014-06-26 19:42                               ` Laura Abbott
2014-06-27  3:06                               ` Tushar Behera
2014-06-27  3:06                                 ` Tushar Behera
2014-06-27  9:09                                 ` Laura Abbott
2014-06-27  9:09                                   ` Laura Abbott
2014-06-27  9:40                                   ` Russell King - ARM Linux
2014-06-27  9:40                                     ` Russell King - ARM Linux
2014-06-26 17:04                           ` Andreas Färber
2014-06-26 17:04                             ` Andreas Färber
2014-06-27  3:28                             ` Tushar Behera
2014-06-27  3:28                               ` Tushar Behera
2014-06-25  2:07               ` Andreas Färber
2014-06-25  2:07                 ` Andreas Färber

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.